EN.685.801.21.SP24
Professors Rodriguez, Saeed, Johnson
import warnings
warnings.filterwarnings('ignore')
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import deque
import random
import plotly.graph_objects as go
import tensorflow.compat.v1 as tf
tf.compat.v1.disable_eager_execution()
from keras.models import Sequential
from keras.regularizers import l2
from keras.layers import Dropout
from tensorflow.keras.optimizers import Adam
from keras.layers import LSTM, Dense
from keras.models import Sequential, clone_model
from sklearn.preprocessing import MinMaxScaler
In our previous module submission, we detailed the creation of the sentiment included dataset for AAPL stock price data. To simplify ETL, I will be utilizing that data as the starting point for this module submission. The process here will be:
Reference data is sourced from: https://fiscaldata.treasury.gov/datasets/average-interest-rates-treasury-securities/average-interest-rates-on-u-s-treasury-securities
# load data
sent = pd.read_csv('sentiment_aapl.csv')
sent['Close/Last'] = pd.to_numeric(sent['Close/Last'].str.replace('$', ''), errors='coerce')
sent['Open'] = pd.to_numeric(sent['Open'].str.replace('$', ''), errors='coerce')
sent['High'] = pd.to_numeric(sent['High'].str.replace('$', ''), errors='coerce')
sent['Low'] = pd.to_numeric(sent['Low'].str.replace('$', ''), errors='coerce')
# sort by date
sent['Date'] = pd.to_datetime(sent['Date'])
sent = sent.sort_values('Date')
sent.reset_index(drop=True, inplace=True)
sent.info()
display(sent)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 795 entries, 0 to 794 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 795 non-null datetime64[ns] 1 Close/Last 795 non-null float64 2 Volume 795 non-null int64 3 Open 795 non-null float64 4 High 795 non-null float64 5 Low 795 non-null float64 6 count 794 non-null float64 7 normalized 794 non-null float64 8 symbol 794 non-null object dtypes: datetime64[ns](1), float64(6), int64(1), object(1) memory usage: 56.0+ KB
| Date | Close/Last | Volume | Open | High | Low | count | normalized | symbol | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2021-01-04 | 129.41 | 143301900 | 133.52 | 133.6116 | 126.760 | 2.0 | 0.9630 | AAPL.US |
| 1 | 2021-01-05 | 131.01 | 97664900 | 128.89 | 131.7400 | 128.430 | 6.0 | 0.8213 | AAPL.US |
| 2 | 2021-01-06 | 126.60 | 155088000 | 127.72 | 131.0499 | 126.382 | 2.0 | 0.9060 | AAPL.US |
| 3 | 2021-01-07 | 130.92 | 109578200 | 128.36 | 131.6300 | 127.860 | 3.0 | 0.5427 | AAPL.US |
| 4 | 2021-01-08 | 132.05 | 105158200 | 132.43 | 132.6300 | 130.230 | 8.0 | 0.9473 | AAPL.US |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 790 | 2024-02-26 | 181.16 | 40867420 | 182.24 | 182.7600 | 180.650 | 25.0 | 0.7475 | AAPL.US |
| 791 | 2024-02-27 | 182.63 | 54318850 | 181.10 | 183.9225 | 179.560 | 21.0 | 0.6107 | AAPL.US |
| 792 | 2024-02-28 | 181.42 | 48953940 | 182.51 | 183.1200 | 180.130 | 24.0 | 0.7662 | AAPL.US |
| 793 | 2024-02-29 | 180.75 | 136682600 | 181.27 | 182.5700 | 179.530 | 18.0 | 0.6934 | AAPL.US |
| 794 | 2024-03-01 | 179.66 | 73563080 | 179.55 | 180.5300 | 177.380 | 19.0 | 0.7424 | AAPL.US |
795 rows × 9 columns
# filter for dates to merge
data = sent[(sent['Date'] >= '2023-01-01') & (sent['Date'] <= '2023-12-31')]
data.reset_index(drop=True, inplace=True)
display(data)
| Date | Close/Last | Volume | Open | High | Low | count | normalized | symbol | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2023-01-03 | 125.07 | 112117500 | 130.280 | 130.9000 | 124.170 | 40.0 | 0.1977 | AAPL.US |
| 1 | 2023-01-04 | 126.36 | 89113630 | 126.890 | 128.6557 | 125.080 | 42.0 | 0.1409 | AAPL.US |
| 2 | 2023-01-05 | 125.02 | 80962710 | 127.130 | 127.7700 | 124.760 | 25.0 | 0.3639 | AAPL.US |
| 3 | 2023-01-06 | 129.62 | 87754720 | 126.010 | 130.2900 | 124.890 | 20.0 | 0.6063 | AAPL.US |
| 4 | 2023-01-09 | 130.15 | 70790810 | 130.465 | 133.4100 | 129.890 | 33.0 | 0.3944 | AAPL.US |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 245 | 2023-12-22 | 193.60 | 37149570 | 195.180 | 195.4100 | 192.970 | 19.0 | 0.5320 | AAPL.US |
| 246 | 2023-12-26 | 193.05 | 28919310 | 193.610 | 193.8900 | 192.830 | 32.0 | 0.1969 | AAPL.US |
| 247 | 2023-12-27 | 193.15 | 48087680 | 192.490 | 193.5000 | 191.090 | 42.0 | 0.1106 | AAPL.US |
| 248 | 2023-12-28 | 193.58 | 34049900 | 194.140 | 194.6600 | 193.170 | 36.0 | 0.4137 | AAPL.US |
| 249 | 2023-12-29 | 192.53 | 42672150 | 193.900 | 194.4000 | 191.725 | 24.0 | 0.7048 | AAPL.US |
250 rows × 9 columns
# load fed rate data
tbill = pd.read_csv('tbill_rates.csv')
# filter for tbill rate
tbill = tbill[tbill['Security Description'] == 'Treasury Bills']
tbill.reset_index(drop=True, inplace=True)
tbill.info()
display(tbill)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 120 entries, 0 to 119 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Record Date 120 non-null object 1 Security Type Description 120 non-null object 2 Security Description 120 non-null object 3 Average Interest Rate Amount 120 non-null float64 4 Source Line Number 120 non-null int64 5 Fiscal Year 120 non-null int64 6 Fiscal Quarter Number 120 non-null int64 7 Calendar Year 120 non-null int64 8 Calendar Quarter Number 120 non-null int64 9 Calendar Month Number 120 non-null int64 10 Calendar Day Number 120 non-null int64 dtypes: float64(1), int64(7), object(3) memory usage: 10.4+ KB
| Record Date | Security Type Description | Security Description | Average Interest Rate Amount | Source Line Number | Fiscal Year | Fiscal Quarter Number | Calendar Year | Calendar Quarter Number | Calendar Month Number | Calendar Day Number | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2/29/2024 | Marketable | Treasury Bills | 5.384 | 1 | 2024 | 2 | 2024 | 1 | 2 | 29 |
| 1 | 1/31/2024 | Marketable | Treasury Bills | 5.411 | 1 | 2024 | 2 | 2024 | 1 | 1 | 31 |
| 2 | 12/31/2023 | Marketable | Treasury Bills | 5.437 | 1 | 2024 | 1 | 2023 | 4 | 12 | 31 |
| 3 | 11/30/2023 | Marketable | Treasury Bills | 5.451 | 1 | 2024 | 1 | 2023 | 4 | 11 | 30 |
| 4 | 10/31/2023 | Marketable | Treasury Bills | 5.437 | 1 | 2024 | 1 | 2023 | 4 | 10 | 31 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 115 | 7/31/2014 | Marketable | Treasury Bills | 0.066 | 1 | 2014 | 4 | 2014 | 3 | 7 | 31 |
| 116 | 6/30/2014 | Marketable | Treasury Bills | 0.068 | 1 | 2014 | 3 | 2014 | 2 | 6 | 30 |
| 117 | 5/31/2014 | Marketable | Treasury Bills | 0.072 | 1 | 2014 | 3 | 2014 | 2 | 5 | 31 |
| 118 | 4/30/2014 | Marketable | Treasury Bills | 0.080 | 1 | 2014 | 3 | 2014 | 2 | 4 | 30 |
| 119 | 3/31/2014 | Marketable | Treasury Bills | 0.084 | 1 | 2014 | 2 | 2014 | 1 | 3 | 31 |
120 rows × 11 columns
# create month-year column for easy merging
data['Month-Year'] = pd.to_datetime(data['Date']).dt.to_period('M').astype(str)
tbill['Month-Year'] = pd.to_datetime(tbill['Record Date']).dt.to_period('M').astype(str)
# left join tbill data
final_data = pd.merge(data, tbill[['Month-Year', 'Average Interest Rate Amount']],
on='Month-Year', how='left')
display(final_data)
| Date | Close/Last | Volume | Open | High | Low | count | normalized | symbol | Month-Year | Average Interest Rate Amount | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2023-01-03 | 125.07 | 112117500 | 130.280 | 130.9000 | 124.170 | 40.0 | 0.1977 | AAPL.US | 2023-01 | 4.242 |
| 1 | 2023-01-04 | 126.36 | 89113630 | 126.890 | 128.6557 | 125.080 | 42.0 | 0.1409 | AAPL.US | 2023-01 | 4.242 |
| 2 | 2023-01-05 | 125.02 | 80962710 | 127.130 | 127.7700 | 124.760 | 25.0 | 0.3639 | AAPL.US | 2023-01 | 4.242 |
| 3 | 2023-01-06 | 129.62 | 87754720 | 126.010 | 130.2900 | 124.890 | 20.0 | 0.6063 | AAPL.US | 2023-01 | 4.242 |
| 4 | 2023-01-09 | 130.15 | 70790810 | 130.465 | 133.4100 | 129.890 | 33.0 | 0.3944 | AAPL.US | 2023-01 | 4.242 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 245 | 2023-12-22 | 193.60 | 37149570 | 195.180 | 195.4100 | 192.970 | 19.0 | 0.5320 | AAPL.US | 2023-12 | 5.437 |
| 246 | 2023-12-26 | 193.05 | 28919310 | 193.610 | 193.8900 | 192.830 | 32.0 | 0.1969 | AAPL.US | 2023-12 | 5.437 |
| 247 | 2023-12-27 | 193.15 | 48087680 | 192.490 | 193.5000 | 191.090 | 42.0 | 0.1106 | AAPL.US | 2023-12 | 5.437 |
| 248 | 2023-12-28 | 193.58 | 34049900 | 194.140 | 194.6600 | 193.170 | 36.0 | 0.4137 | AAPL.US | 2023-12 | 5.437 |
| 249 | 2023-12-29 | 192.53 | 42672150 | 193.900 | 194.4000 | 191.725 | 24.0 | 0.7048 | AAPL.US | 2023-12 | 5.437 |
250 rows × 11 columns
Now that we have our sourced features, we will move onto the next section in order to generate features to help feed our eventual RL algorithms.
We will now create lagged features for close price and sentiment while also creating a delta fed rate column to capture the month-over-month change in fed rate (this will be to help train the RL agent to make policy based on changing fed rate).
# pull base feature columns
close_series = final_data['Close/Last']
sentiment_series = final_data['normalized']
tbill_series = final_data['Average Interest Rate Amount']
# generate lag features for close price
lagged_close_10 = close_series.shift(-10)
lagged_close_9 = close_series.shift(-9)
lagged_close_8 = close_series.shift(-8)
lagged_close_7 = close_series.shift(-7)
lagged_close_6 = close_series.shift(-6)
lagged_close_5 = close_series.shift(-5)
lagged_close_4 = close_series.shift(-4)
lagged_close_3 = close_series.shift(-3)
lagged_close_2 = close_series.shift(-2)
lagged_close_1 = close_series.shift(-1)
# generate lag features for sentiment
lagged_sentiment_10 = sentiment_series.shift(-10)
lagged_sentiment_9 = sentiment_series.shift(-9)
lagged_sentiment_8 = sentiment_series.shift(-8)
lagged_sentiment_7 = sentiment_series.shift(-7)
lagged_sentiment_6 = sentiment_series.shift(-6)
lagged_sentiment_5 = sentiment_series.shift(-5)
lagged_sentiment_4 = sentiment_series.shift(-4)
lagged_sentiment_3 = sentiment_series.shift(-3)
lagged_sentiment_2 = sentiment_series.shift(-2)
lagged_sentiment_1 = sentiment_series.shift(-1)
# generate MoM delta feature for fed rate
delta_tbill_30 = tbill_series - tbill_series.shift(-30)
# combine into df
lagged_df = pd.DataFrame({'Close': close_series,
'Close_Lagged_10': lagged_close_10,
'Close_Lagged_9': lagged_close_9,
'Close_Lagged_8': lagged_close_8,
'Close_Lagged_7': lagged_close_7,
'Close_Lagged_6': lagged_close_6,
'Close_Lagged_5': lagged_close_5,
'Close_Lagged_4': lagged_close_4,
'Close_Lagged_3': lagged_close_3,
'Close_Lagged_2': lagged_close_2,
'Close_Lagged_1': lagged_close_1,
'Sentiment': sentiment_series,
'Sent_Lagged_10': lagged_sentiment_10,
'Sent_Lagged_9': lagged_sentiment_9,
'Sent_Lagged_8': lagged_sentiment_8,
'Sent_Lagged_7': lagged_sentiment_7,
'Sent_Lagged_6': lagged_sentiment_6,
'Sent_Lagged_5': lagged_sentiment_5,
'Sent_Lagged_4': lagged_sentiment_4,
'Sent_Lagged_3': lagged_sentiment_3,
'Sent_Lagged_2': lagged_sentiment_2,
'Sent_Lagged_1': lagged_sentiment_1,
'TBill_Delta_30': delta_tbill_30})
# drop NaNs, reset index
lagged_df.dropna(inplace=True)
lagged_df.reset_index(drop=True, inplace=True)
lagged_df
| Close | Close_Lagged_10 | Close_Lagged_9 | Close_Lagged_8 | Close_Lagged_7 | Close_Lagged_6 | Close_Lagged_5 | Close_Lagged_4 | Close_Lagged_3 | Close_Lagged_2 | ... | Sent_Lagged_9 | Sent_Lagged_8 | Sent_Lagged_7 | Sent_Lagged_6 | Sent_Lagged_5 | Sent_Lagged_4 | Sent_Lagged_3 | Sent_Lagged_2 | Sent_Lagged_1 | TBill_Delta_30 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 125.07 | 135.21 | 135.94 | 134.76 | 133.41 | 133.49 | 130.73 | 130.15 | 129.62 | 125.02 | ... | 0.5684 | 0.3790 | 0.3819 | 0.3867 | 0.4647 | 0.3944 | 0.6063 | 0.3639 | 0.1409 | -0.233 |
| 1 | 126.36 | 135.27 | 135.21 | 135.94 | 134.76 | 133.41 | 133.49 | 130.73 | 130.15 | 129.62 | ... | 0.2436 | 0.5684 | 0.3790 | 0.3819 | 0.3867 | 0.4647 | 0.3944 | 0.6063 | 0.3639 | -0.233 |
| 2 | 125.02 | 137.87 | 135.27 | 135.21 | 135.94 | 134.76 | 133.41 | 133.49 | 130.73 | 130.15 | ... | 0.7244 | 0.2436 | 0.5684 | 0.3790 | 0.3819 | 0.3867 | 0.4647 | 0.3944 | 0.6063 | -0.233 |
| 3 | 129.62 | 141.11 | 137.87 | 135.27 | 135.21 | 135.94 | 134.76 | 133.41 | 133.49 | 130.73 | ... | 0.3608 | 0.7244 | 0.2436 | 0.5684 | 0.3790 | 0.3819 | 0.3867 | 0.4647 | 0.3944 | -0.233 |
| 4 | 130.15 | 142.53 | 141.11 | 137.87 | 135.27 | 135.21 | 135.94 | 134.76 | 133.41 | 133.49 | ... | 0.5641 | 0.3608 | 0.7244 | 0.2436 | 0.5684 | 0.3790 | 0.3819 | 0.3867 | 0.4647 | -0.233 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 215 | 182.41 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | 188.01 | 187.44 | 184.80 | ... | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.6527 | 0.5723 | 0.5566 | 0.014 |
| 216 | 186.40 | 189.79 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | 188.01 | 187.44 | ... | 0.6564 | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.6527 | 0.5723 | 0.014 |
| 217 | 184.80 | 190.40 | 189.79 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | 188.01 | ... | 0.7044 | 0.6564 | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.6527 | 0.014 |
| 218 | 187.44 | 189.37 | 190.40 | 189.79 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | ... | 0.6914 | 0.7044 | 0.6564 | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.014 |
| 219 | 188.01 | 189.95 | 189.37 | 190.40 | 189.79 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | ... | 0.4298 | 0.6914 | 0.7044 | 0.6564 | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.014 |
220 rows × 23 columns
Now that we have the base data, let us utilize an LSTM to include a forecasted next-day price for each state as an additional feature.
# lookback
num_prev_days = 10
# prepare input data for LSTM
X = []
y = []
for i in range(len(lagged_df) - num_prev_days):
X.append(lagged_df[['Close', 'Sentiment', 'TBill_Delta_30']].iloc[i:i+num_prev_days].values)
y.append(lagged_df['Close'].iloc[i + num_prev_days])
X = np.array(X)
y = np.array(y)
# normalize for LSTM training
scaler_X = MinMaxScaler(feature_range=(0, 1))
X_scaled = scaler_X.fit_transform(X.reshape(-1, X.shape[-1])).reshape(X.shape)
scaler_y = MinMaxScaler(feature_range=(0, 1))
y_scaled = scaler_y.fit_transform(y.reshape(-1, 1)).flatten()
# reshape for input into LSTM
X_reshaped = X_scaled.reshape(X_scaled.shape[0], num_prev_days, X_scaled.shape[2])
# LSTM architecture
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(num_prev_days, X_scaled.shape[2])))
model.add(LSTM(units=50))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')
# train LSTM
model.fit(X_reshaped, y_scaled, epochs=100, batch_size=32, verbose=0)
# predict and inverse transform predictions
predicted_values_scaled = model.predict(X_reshaped)
predicted_values = scaler_y.inverse_transform(predicted_values_scaled)
# pad predictions with lookback + 1 so that every row shows predicted value next day
padded_pred = np.concatenate((np.full((num_prev_days - 1, 1), np.nan), predicted_values, np.full((1, 1), np.nan)))
len(padded_pred)
# insert data into dataframe
lagged_df.insert(1, 'Pred_Close_Tmrw', padded_pred)
lagged_df = lagged_df.iloc[:-1]
lagged_df
| Close | Pred_Close_Tmrw | Close_Lagged_10 | Close_Lagged_9 | Close_Lagged_8 | Close_Lagged_7 | Close_Lagged_6 | Close_Lagged_5 | Close_Lagged_4 | Close_Lagged_3 | ... | Sent_Lagged_9 | Sent_Lagged_8 | Sent_Lagged_7 | Sent_Lagged_6 | Sent_Lagged_5 | Sent_Lagged_4 | Sent_Lagged_3 | Sent_Lagged_2 | Sent_Lagged_1 | TBill_Delta_30 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 125.07 | NaN | 135.21 | 135.94 | 134.76 | 133.41 | 133.49 | 130.73 | 130.15 | 129.62 | ... | 0.5684 | 0.3790 | 0.3819 | 0.3867 | 0.4647 | 0.3944 | 0.6063 | 0.3639 | 0.1409 | -0.233 |
| 1 | 126.36 | NaN | 135.27 | 135.21 | 135.94 | 134.76 | 133.41 | 133.49 | 130.73 | 130.15 | ... | 0.2436 | 0.5684 | 0.3790 | 0.3819 | 0.3867 | 0.4647 | 0.3944 | 0.6063 | 0.3639 | -0.233 |
| 2 | 125.02 | NaN | 137.87 | 135.27 | 135.21 | 135.94 | 134.76 | 133.41 | 133.49 | 130.73 | ... | 0.7244 | 0.2436 | 0.5684 | 0.3790 | 0.3819 | 0.3867 | 0.4647 | 0.3944 | 0.6063 | -0.233 |
| 3 | 129.62 | NaN | 141.11 | 137.87 | 135.27 | 135.21 | 135.94 | 134.76 | 133.41 | 133.49 | ... | 0.3608 | 0.7244 | 0.2436 | 0.5684 | 0.3790 | 0.3819 | 0.3867 | 0.4647 | 0.3944 | -0.233 |
| 4 | 130.15 | NaN | 142.53 | 141.11 | 137.87 | 135.27 | 135.21 | 135.94 | 134.76 | 133.41 | ... | 0.5641 | 0.3608 | 0.7244 | 0.2436 | 0.5684 | 0.3790 | 0.3819 | 0.3867 | 0.4647 | -0.233 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 214 | 182.89 | 180.867538 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | 188.01 | 187.44 | 184.80 | ... | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.6527 | 0.5723 | 0.5566 | 0.4812 | 0.014 |
| 215 | 182.41 | 182.534225 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | 188.01 | 187.44 | ... | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.6527 | 0.5723 | 0.5566 | 0.014 |
| 216 | 186.40 | 184.481201 | 189.79 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | 188.01 | ... | 0.6564 | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.6527 | 0.5723 | 0.014 |
| 217 | 184.80 | 185.339432 | 190.40 | 189.79 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | ... | 0.7044 | 0.6564 | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.6527 | 0.014 |
| 218 | 187.44 | 186.127121 | 189.37 | 190.40 | 189.79 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | ... | 0.6914 | 0.7044 | 0.6564 | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.014 |
219 rows × 24 columns
Now let's just remove NaNs so that the RL agent is not training using a NaN.
# drop off all NaNs and continue with shortened dataset
lagged_df = lagged_df.iloc[9:]
lagged_df.reset_index(drop=True, inplace=True)
lagged_df
| Close | Pred_Close_Tmrw | Close_Lagged_10 | Close_Lagged_9 | Close_Lagged_8 | Close_Lagged_7 | Close_Lagged_6 | Close_Lagged_5 | Close_Lagged_4 | Close_Lagged_3 | ... | Sent_Lagged_9 | Sent_Lagged_8 | Sent_Lagged_7 | Sent_Lagged_6 | Sent_Lagged_5 | Sent_Lagged_4 | Sent_Lagged_3 | Sent_Lagged_2 | Sent_Lagged_1 | TBill_Delta_30 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 135.94 | 134.996170 | 144.29 | 143.00 | 145.93 | 143.96 | 141.86 | 142.53 | 141.11 | 137.87 | ... | 0.3511 | 0.3045 | 0.5189 | 0.4301 | 0.6312 | 0.5641 | 0.3608 | 0.7244 | 0.2436 | -0.392 |
| 1 | 135.21 | 135.827621 | 145.43 | 144.29 | 143.00 | 145.93 | 143.96 | 141.86 | 142.53 | 141.11 | ... | 0.3100 | 0.3511 | 0.3045 | 0.5189 | 0.4301 | 0.6312 | 0.5641 | 0.3608 | 0.7244 | -0.392 |
| 2 | 135.27 | 136.718994 | 150.82 | 145.43 | 144.29 | 143.00 | 145.93 | 143.96 | 141.86 | 142.53 | ... | 0.5118 | 0.3100 | 0.3511 | 0.3045 | 0.5189 | 0.4301 | 0.6312 | 0.5641 | 0.3608 | -0.392 |
| 3 | 137.87 | 137.518051 | 154.50 | 150.82 | 145.43 | 144.29 | 143.00 | 145.93 | 143.96 | 141.86 | ... | 0.3336 | 0.5118 | 0.3100 | 0.3511 | 0.3045 | 0.5189 | 0.4301 | 0.6312 | 0.5641 | -0.392 |
| 4 | 141.11 | 138.487869 | 151.73 | 154.50 | 150.82 | 145.43 | 144.29 | 143.00 | 145.93 | 143.96 | ... | 0.2549 | 0.3336 | 0.5118 | 0.3100 | 0.3511 | 0.3045 | 0.5189 | 0.4301 | 0.6312 | -0.392 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 205 | 182.89 | 180.867538 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | 188.01 | 187.44 | 184.80 | ... | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.6527 | 0.5723 | 0.5566 | 0.4812 | 0.014 |
| 206 | 182.41 | 182.534225 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | 188.01 | 187.44 | ... | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.6527 | 0.5723 | 0.5566 | 0.014 |
| 207 | 186.40 | 184.481201 | 189.79 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | 188.01 | ... | 0.6564 | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.6527 | 0.5723 | 0.014 |
| 208 | 184.80 | 185.339432 | 190.40 | 189.79 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | 189.71 | ... | 0.7044 | 0.6564 | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.6527 | 0.014 |
| 209 | 187.44 | 186.127121 | 189.37 | 190.40 | 189.79 | 189.97 | 191.31 | 190.64 | 191.45 | 189.69 | ... | 0.6914 | 0.7044 | 0.6564 | 0.5859 | 0.4360 | 0.1307 | 0.5721 | 0.7326 | 0.7577 | 0.014 |
210 rows × 24 columns
Let's also quickly just look at these predictions versus the real close prices. Please note that there will be a bit of a forward shift in the data given that we are specifically trying to give each day's close price a peak into the future by giving it tomorrow's predicted close price thus the plot will look slightly shifted.
plt.figure(figsize=(15, 10))
# plot actual close prices
plt.plot(lagged_df['Close'], label='Actual Close Prices', color='blue')
# slot padded predicted values
plt.plot(padded_pred, label='Predicted Close Prices', color='red')
# set labels and title
plt.xlabel('Days')
plt.ylabel('Price')
plt.title('Actual vs Predicted Close Prices')
plt.show()
Now that we have completed sourcing, loading, preparing, cleaning the data while also generating relevant features, this dataframe can now be used to train a Q-Learning and Deep Q-Learning algorithm.
To understand how this is set up, we will do the following:
class QLearningAgent:
def __init__(self, state_size, action_size, learning_rate=0.1, discount_factor=0.99, epsilon=1.0, epsilon_decay=0.995, epsilon_min=0.5):
self.state_size = state_size
self.action_size = action_size
self.q_table = np.zeros((state_size, len(action_size)))
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.epsilon = epsilon
self.epsilon_decay = epsilon_decay
self.epsilon_min = epsilon_min
# choose action using epsilon greedy
def choose_action(self, state):
if np.random.rand() <= self.epsilon:
return np.random.choice(self.action_size) # EXPLORE
else:
return np.argmax(self.q_table[state]) # EXPLOIT
# update q-table using Bellman
def update_q_table(self, state, action, reward, next_state):
best_next_action = np.argmax(self.q_table[next_state])
td_target = reward + self.discount_factor * self.q_table[next_state, best_next_action]
td_error = td_target - self.q_table[state, action]
self.q_table[state, action] += self.learning_rate * td_error
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
# save and load tables just for testing not needed
def save_q_table(self, file_path):
np.save(file_path, self.q_table)
def load_q_table(self, file_path):
if os.path.exists(file_path):
self.q_table = np.load(file_path)
# calculate reward as profit based on today vs tmrw stock profits
def calculate_reward_old1(prev_action, current_price, prev_price):
if prev_action == 0: # Buy
if current_price > prev_price: # price went up after buying GOOD
reward = round((current_price - prev_price), 2) # reward is today's price - yesterday's price
else: # price went down after selling BAD
reward = round((current_price - prev_price), 2) # reward is today's price - yesterday's price
elif prev_action == 1: # Sell
if current_price < prev_price: # price went down after selling GOOD
reward = round((prev_price - current_price), 2) # reward is yesterday's price - today's price
else: # price went up after selling BAD
reward = round((prev_price - current_price), 2) # reward is yesterday's price - today's price
else:
reward = 0 # no action
return reward
# calculate reward which rewards buying low and selling high but updates above with no negative rewards, just 0
def calculate_reward(prev_action, prev_price, current_price, num_stocks_held):
if prev_action == 0: # Buy
reward = max(0, current_price - prev_price) # potential profit
elif prev_action == 1: # Sell
reward = max(0, prev_price - current_price) # loss avoidance
else:
reward = 0 # Hold
# additional reward for overall portfolio increase
if num_stocks_held > 0:
reward += (current_price - prev_price) * num_stocks_held
return round(reward, 2)
def train_q_learning(state_matrix, action_space, num_episodes, max_budget, q_table_file='q_table.npy'):
# initialize Q-learning agent
agent = QLearningAgent(state_size=state_matrix.shape[0], action_size=action_space)
profits = []
rewards = []
for episode in range(num_episodes):
total_reward = 0
num_stocks_held = 0
available_budget = max_budget
state = 0 # initial state
episode_profit = [] # list to store profit/loss of each action in the episode
buys = 0 # count of buy actions
sells = 0 # count of sell actions
prev_action = None
for t in range(len(state_matrix) - 1): # iterate over each time step
# update action space based on the number of stocks held
if available_budget < state_matrix[t, 0]: # if budget is insufficient for buying
modified_action_space = np.delete(action_space, 0) # remove 'buy' action
elif num_stocks_held == 0:
modified_action_space = np.delete(action_space, 1) # remove 'sell' action
else:
modified_action_space = action_space # all actions are available
action = agent.choose_action(state)
# calculate the reward based on yesterday's chosen action
reward = calculate_reward(prev_action, state_matrix[t-1, 0], state_matrix[t, 0], num_stocks_held)
total_reward += reward
value = 0
# Calculate value based on action
if action == 0 and available_budget >= state_matrix[t, 0]: # Buy
value = state_matrix[t, 0] # value of purchased stock on day of purchase
num_stocks_held += 1
available_budget -= value
buys += 1
elif action == 1 and num_stocks_held > 0: # Sell
value = state_matrix[t, 0] # value of sold stock on day of sale
num_stocks_held -= 1
available_budget += value
sells += 1
# update episode profit
episode_profit.append(reward)
next_state = state + 1
# update q-table
agent.update_q_table(state, action, reward, next_state)
# update previous action
prev_action = action
state = next_state
# estimate total value by adjusting for the last action
portfolio_value = available_budget + (num_stocks_held * state_matrix[-1, 0])
if episode % 100 == 0:
print(f"-------------------------------------------------------------------------------------------------------------------")
print(f"Episode {episode + 1}, # Stocks: {num_stocks_held}, Total Rewards: {total_reward:.2f}, Portfolio Value: {portfolio_value:.2f}, Available Budget: {available_budget:.2f}, Buys: {buys}, Sells: {sells}")
print(f"-------------------------------------------------------------------------------------------------------------------")
print("Daily Profits:", episode_profit)
agent.save_q_table(q_table_file)
profits.append(portfolio_value)
rewards.append(total_reward)
return agent, profits, rewards
agent1, profits1, rewards1 = train_q_learning(lagged_df.values, action_space=[0, 1, 2],
num_episodes=3000, max_budget=1000, q_table_file='q_table.npy')
------------------------------------------------------------------------------------------------------------------- Episode 1, # Stocks: 6, Total Rewards: 306.11, Portfolio Value: 1190.84, Available Budget: 66.20, Buys: 49, Sells: 43 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0, 0, 0, 6.48, 1.42, -0.67, 0, 0, 0, 2.58, 3.42, 5.39, 0, 2.77, 0, 0, 1.05, 0, 0, -0.65, 0, 0, 0, 0, 0.86, 0, 0, 0, -0.51, -4.2, 0.6, 5.12, 8.4, -6.69, 2.54, -6.84, -6.27, 3.94, 8.48, 2.0, 17.16, -2.55, 7.2, 9.4, -5.8, 6.6, 6.6, -11.82, -2.52, 12.48, 9.54, 17.78, 8.89, -3.24, -11.22, 6.3, -10.52, -7.38, -2.8, 38.22, -2.1, 0.12, 8.68, 6.96, -5.88, -9.78, 2.17, -9.36, -0.06, 23.25, 8.89, -0.36, -3.15, -2.18, -6.64, 31.12, -0.28, -8.65, 8.92, 0.97, -7.08, -3.0, 0.0, 3.72, 16.52, 0.77, -5.76, -10.56, 1.96, 6.9, 12.2, 13.09, -0.2, 19.88, 5.16, -8.22, -2.22, -8.34, 13.75, 2.73, 19.81, -2.88, 4.48, 14.42, -6.54, 0.54, -6.3, 18.24, -1.92, -8.46, 19.53, 8.33, 2.38, 30.66, -9.06, -4.52, 1.92, -5.65, -12.42, -3.18, 11.83, 5.39, 0.75, 23.1, -1.56, 9.59, -11.82, -7.14, 4.86, 6.09, 4.4, -7.68, 18.27, 4.34, -5.07, -18.15, -8.46, -55.08, -18.84, 6.65, -9.66, -0.88, -0.54, 10.02, -12.06, -5.28, -15.42, 3.43, 6.75, 9.73, 23.34, -28.44, 15.61, 9.48, 27.51, 17.65, 1.1, 6.36, 1.44, -20.37, -26.75, 3.1, 4.72, -15.3, -12.54, 10.71, -4.38, 17.76, 5.5, -21.48, -9.36, 5.16, 9.03, -24.72, -9.18, 1.82, 2.6, 17.78, -5.4, 6.3, 6.25, 18.06, 9.0, -3.6, 9.87, 6.37, -7.44, -0.65, -9.42, -5.24, -2.28, -15.48, 0.84, 3.08, -14.04, -16.84, 5.32, 12.42, 1.92, 19.2, 14.4, -4.6, 18.06, 15.54, 7.49, -1.92, 27.93, -9.6] ------------------------------------------------------------------------------------------------------------------- Episode 101, # Stocks: 6, Total Rewards: 233.74, Portfolio Value: 1133.99, Available Budget: 9.35, Buys: 62, Sells: 56 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, -0.73, 0, 0, 6.48, 1.42, -1.34, 4.2, 3.94, -5.86, 1.29, 0, 0, 0, 2.77, 0, 0, 0, 0.28, 8.52, -1.95, 4.26, -4.86, -4.64, -20.35, 1.72, 2.94, -8.07, 3.63, -2.04, -8.4, 1.8, 25.6, 8.4, -2.23, 5.08, -6.84, -6.27, 9.85, 12.72, 2.8, 14.3, -5.1, 14.4, 11.28, -5.8, 4.4, 3.96, -5.91, -1.89, 6.24, 3.18, 10.16, 6.35, -1.08, -1.87, 3.6, -10.52, -4.92, -3.5, 38.22, -2.1, 0.1, 8.68, 6.96, -5.88, -6.52, 1.24, -6.24, -0.05, 18.6, 5.08, -0.18, -3.15, -3.27, -4.98, 23.34, -0.28, -3.46, 3.57, 0.19, 1.18, -0.5, 0.0, 1.24, 2.36, 0.11, -0.96, -5.28, 0.28, 0, 0, 3.74, -0.1, 2.84, 0.86, -2.74, -0.74, -2.78, 5.5, 1.56, 14.15, -2.4, 3.2, 14.42, -4.36, 0.63, -6.3, 15.2, -1.92, -8.46, 16.74, 7.14, 2.38, 30.66, -6.04, -6.78, 3.36, -6.78, -12.42, -3.18, 10.14, 3.85, 0.6, 19.8, -1.3, 9.59, -7.88, -5.95, 5.67, 5.22, 5.28, -7.68, 15.66, 4.34, -5.07, -12.1, -7.05, -27.54, -12.56, 5.7, -8.05, -0.66, -0.72, 5.01, -2.01, -1.76, -5.14, 0.49, 0, 2.78, 11.67, -14.22, 4.46, 1.58, 3.93, 0, 0.44, 1.59, 0, 6.79, -5.35, 0.62, 3.54, 0.0, 2.09, 3.06, -1.46, 2.96, 3.3, -10.74, -1.56, 1.72, 2.58, -8.24, -3.06, 0.26, 0.52, 0, 0, 0, 0, 5.16, 4.5, -1.8, 7.05, 2.73, -7.44, -0.65, -9.42, -7.86, -2.28, -15.48, 0.72, 2.64, -14.04, -25.26, 6.65, 14.49, 3.36, 16.0, 18.0, -5.52, 15.48, 15.54, 7.49, -2.88, 23.94, -9.6] ------------------------------------------------------------------------------------------------------------------- Episode 201, # Stocks: 4, Total Rewards: 173.34, Portfolio Value: 1084.20, Available Budget: 334.44, Buys: 61, Sells: 57 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0, 0, 0, 0, 0, 0, 0, 2.93, 0, 0, 10.78, 0, 0, 0, 0, 1.05, 0.28, 0, 0.65, 0, 1.62, -1.16, 4.07, 0.86, 1.47, 0.0, 0, -0.51, -2.1, 0, 0, 0, 2.23, 0, -2.28, 2.09, 0, 4.24, 1.2, 2.86, -1.7, 4.8, 7.52, -1.45, 2.2, 2.64, -5.91, -2.52, 9.36, 3.18, 2.54, 3.81, 0.0, 1.87, 0, 0, 0, -0.7, 0, -0.35, 0.06, 2.48, 4.64, -0.98, -4.89, 0.93, -4.68, -0.04, 27.9, 6.35, -0.45, -3.15, -4.36, -6.64, 23.34, -0.07, 0.0, 5.35, 0.19, 1.18, -0.5, 0.0, 0, 0, 0, 0, 2.64, 0, 2.3, 0, 3.74, 0.05, 0, 1.72, -1.37, -0.37, -2.78, 5.5, 1.56, 5.66, 0.0, 0, 4.12, 1.09, 0.18, -1.05, 0, -0.32, -2.82, 11.16, 3.57, 0.68, 8.76, -4.53, -3.39, 1.44, -4.52, -8.28, -2.65, 6.76, 2.31, 0.3, 3.3, -0.26, 4.11, -5.91, -1.19, 1.62, 1.74, 1.76, 0.0, 2.61, 1.86, 0.0, -3.02, -2.82, -18.36, 0.0, 0.95, -1.61, 0.22, 0, 0, 0, 0, 2.57, 0, 2.7, 4.17, 7.78, -9.48, 4.46, 6.32, 7.86, 3.53, 0, 0, 0.48, -6.79, -10.7, 1.24, 4.72, -3.06, 0.0, 0, -0.73, 2.96, 0, 3.58, 0, 1.72, 1.29, -8.24, -4.59, 0.52, 1.04, 10.16, -5.4, 7.56, 6.25, 10.32, 6.0, -3.0, 7.05, 6.37, -11.16, -0.52, -7.85, -7.86, -2.28, -10.32, 0.48, 2.64, -11.7, -21.05, 5.32, 12.42, 1.92, 9.6, 7.2, -2.76, 7.74, 12.95, 4.28, -0.96, 11.97, -4.8] ------------------------------------------------------------------------------------------------------------------- Episode 301, # Stocks: 3, Total Rewards: 213.93, Portfolio Value: 1089.95, Available Budget: 527.63, Buys: 59, Sells: 56 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2.77, 2.92, -5.46, 0.0, 0.14, 2.84, 0.65, 0, 0, -1.16, -8.14, 1.72, 2.45, -5.38, 3.63, -2.04, -10.5, 3.0, 20.48, 11.2, -8.92, 5.08, -11.4, -6.27, 5.91, 10.6, 2.4, 11.44, -4.25, 12.0, 9.4, -8.7, 7.7, 7.92, -11.82, -3.78, 18.72, 11.13, 17.78, 8.89, -2.16, -5.61, 2.7, -2.63, 0.0, -0.7, 16.38, 0.0, 0.02, 1.24, 1.16, 0.98, 1.63, 0, 0, 0.01, 0, 2.54, 0.09, 1.05, -1.09, -1.66, 23.34, 0.0, -1.73, 5.35, 0.19, 1.18, -0.5, 0, 1.24, 2.36, 0, 0, 0, 0, 0, 0, 0, 0.05, 0, 1.72, 1.37, 0, 0, 0, 0.78, 8.49, -1.44, 1.28, 8.24, -3.27, 0.45, -5.25, 18.24, -1.6, -7.05, 13.95, 5.95, 1.36, 13.14, -6.04, -5.65, 2.88, -5.65, -6.21, -2.65, 8.45, 3.08, 0.45, 16.5, -1.3, 6.85, -5.91, -2.38, 2.43, 4.35, 5.28, -3.84, 7.83, 3.1, -1.69, -12.1, -2.82, -36.72, -12.56, 5.7, -4.83, -0.88, -0.9, 8.35, -10.05, -4.4, -12.85, 2.45, 9.45, 8.34, 27.23, -18.96, 8.92, 9.48, 27.51, 21.18, 1.1, 11.13, 1.68, -27.16, -16.05, 1.86, 5.9, -15.3, -6.27, 4.59, -2.19, 5.92, 4.4, -10.74, -4.68, 1.72, 2.58, -12.36, -4.59, 0.78, 1.56, 7.62, -1.35, 2.52, 1.25, 0, 0, -0.6, 4.23, 1.82, -5.58, -0.52, -7.85, -3.93, -1.9, -7.74, 0.36, 2.2, -9.36, -8.42, 3.99, 10.35, 1.44, 6.4, 14.4, -2.76, 12.9, 15.54, 4.28, -0.96, 11.97, -1.6] ------------------------------------------------------------------------------------------------------------------- Episode 401, # Stocks: 2, Total Rewards: 140.24, Portfolio Value: 1000.86, Available Budget: 625.98, Buys: 62, Sells: 60 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0, 0, 0, 0, 0, 0.67, 4.2, 5.91, 0.0, 0, 0, 10.78, 11.04, 0.0, 2.92, 2.73, 1.05, 0, 0, 0.65, 0, 1.62, 0, 0, 0, 0.98, 2.69, 0, -0.51, -4.2, 0.6, 5.12, 0, -2.23, 3.81, -6.84, -2.09, 7.88, 4.24, 0.4, 0, 0, 0, 0, -1.45, 3.3, 1.32, -3.94, 0.0, 9.36, 3.18, 10.16, 6.35, -1.08, -1.87, 0.9, -5.26, -2.46, -2.1, 27.3, -0.7, 0.1, 4.96, 3.48, -3.92, -6.52, 0.93, -1.56, -0.02, 4.65, 0, 0, 1.05, 0, 0, 15.56, 0.07, -1.73, 5.35, 0.19, 1.18, 0, 0, 1.24, 7.08, 0.11, -0.96, 2.64, 0, 0, 0, 3.74, 0.05, 0, 0, -1.37, -0.37, -1.39, 8.25, 0.78, 5.66, -1.44, 1.28, 4.12, 0.0, 0.27, 0.0, 3.04, 0.32, -1.41, 8.37, 4.76, 0.68, 4.38, -1.51, -1.13, 0.48, -1.13, -4.14, -1.06, 3.38, 1.54, 0.6, 16.5, -1.04, 5.48, -9.85, -3.57, 4.86, 5.22, 4.4, -3.84, 7.83, 1.86, -0.84, -9.07, -4.23, -36.72, -12.56, 3.8, -3.22, -0.66, -0.54, 8.35, -4.02, -0.88, 0.0, 0, 0, 2.78, 0, 4.74, 0, 3.16, 0, 0, 0.44, 1.59, 0.24, 6.79, -5.35, 1.86, 4.72, -9.18, -2.09, 1.53, -0.73, 8.88, 1.1, -7.16, -4.68, 1.72, 1.29, -4.12, 1.53, 0, 0, 5.08, 1.35, 0, 0, 0, 3.0, -0.6, 4.23, 3.64, -7.44, -0.52, -6.28, -6.55, -1.9, -7.74, 0.48, 1.32, -7.02, -4.21, 1.33, 2.07, 0, 0, 7.2, -0.92, 0, 0, 2.14, -0.96, 15.96, -4.8] ------------------------------------------------------------------------------------------------------------------- Episode 501, # Stocks: 5, Total Rewards: 276.42, Portfolio Value: 1151.20, Available Budget: 214.00, Buys: 65, Sells: 60 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0, 5.2, 9.72, 5.68, -2.01, 6.3, 9.85, -14.65, 5.16, 4.56, 32.34, 14.72, -5.54, 14.6, -5.46, -3.15, 0.28, 2.84, 0.65, 4.26, -3.24, -2.32, -12.21, 2.15, 1.96, -5.38, 2.42, 0.0, -4.2, 0.6, 15.36, 5.6, 0.0, 0, -2.28, -4.18, 3.94, 8.48, 1.2, 5.72, -1.7, 2.4, 0, 1.45, 0, 0, -1.97, 0.63, 0, 3.18, 0, 0, 0.54, 1.87, 0, 2.63, 1.23, -0.7, 5.46, 0.35, 0.04, 1.24, 1.16, -1.96, 0.0, 0, 0, 0, 9.3, 1.27, 0.09, 1.05, -1.09, -3.32, 7.78, -0.14, 0.0, 0, 0, 1.18, -0.5, 0, 1.24, 7.08, 0.22, -1.92, -7.92, 0.84, 2.3, 2.44, 0, 0, 5.68, 2.58, -2.74, -0.74, -2.78, 5.5, 0.78, 11.32, -1.92, 1.92, 6.18, -4.36, 0.54, -5.25, 15.2, -1.6, -4.23, 11.16, 3.57, 0.68, 17.52, -4.53, -3.39, 1.44, -4.52, -10.35, -2.65, 8.45, 3.08, 0.45, 6.6, -0.78, 2.74, 0.0, -2.38, 3.24, 2.61, 4.4, -2.56, 5.22, 1.24, -1.69, -9.07, -4.23, -9.18, 0.0, 0.95, 1.61, 0.22, 0.18, 3.34, -2.01, -0.88, 2.57, 0.98, 0, 2.78, 3.89, -4.74, 6.69, 1.58, 11.79, 14.12, 0.44, 1.59, 0, 0, 5.35, 0, 2.36, -6.12, -6.27, 3.06, -1.46, 11.84, 2.2, -10.74, -4.68, 2.58, 3.87, -16.48, -6.12, 1.04, 1.56, 12.7, -5.4, 3.78, 6.25, 7.74, 7.5, -1.2, 7.05, 2.73, -7.44, -0.65, -9.42, -7.86, -2.28, -15.48, 0.84, 2.2, -11.7, -12.63, 7.98, 10.35, 1.92, 12.8, 21.6, -4.6, 12.9, 12.95, 5.35, -2.4, 19.95, -4.8] ------------------------------------------------------------------------------------------------------------------- Episode 601, # Stocks: 3, Total Rewards: 326.55, Portfolio Value: 1168.75, Available Budget: 606.43, Buys: 65, Sells: 62 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0.12, 7.8, 6.48, 5.68, -2.01, 10.5, 5.91, -2.93, 2.58, 1.14, 16.17, 14.72, -2.77, 2.92, 2.73, 1.05, 0, 0, 0, 0, 1.62, -1.16, 4.07, 0, 0.98, -5.38, 2.42, -1.53, -6.3, 3.0, 30.72, 14.0, -6.69, 5.08, -11.4, -12.54, 11.82, 16.96, 3.2, 17.16, -5.1, 14.4, 9.4, -8.7, 7.7, 6.6, -5.91, -3.15, 12.48, 4.77, 12.7, 7.62, -2.7, -5.61, 2.7, -10.52, -4.92, -1.4, 10.92, -0.7, 0.08, 3.72, 5.8, -4.9, -4.89, 1.86, -4.68, -0.02, 23.25, 5.08, -0.18, -3.15, -1.09, 0.0, 23.34, -0.21, -6.92, 5.35, 0.58, -1.18, -1.5, 0.0, 2.48, 4.72, 0.11, 0.96, 2.64, 0, 0, 0, 0, 0.05, 0, 0, 0, 0, 0, 0, 0.78, 0, 0, 0, 0, -1.09, 0.27, -3.15, 9.12, -0.96, -4.23, 13.95, 4.76, 2.04, 17.52, -6.04, -2.26, 2.4, -4.52, -4.14, -0.53, 3.38, 0.77, 0, 0, -0.26, 4.11, 0.0, -2.38, 0.81, 0.87, 0, 0, 0, 1.24, -1.69, -9.07, -1.41, 0.0, 3.14, 0, 1.61, 0, 0, 3.34, -2.01, -1.76, 0.0, 1.47, 2.7, 5.56, 19.45, -9.48, 11.15, 9.48, 27.51, 17.65, 0.88, 9.54, 0.96, -33.95, -16.05, 2.48, 4.72, -6.12, -2.09, 6.12, -2.19, 5.92, 4.4, -10.74, -4.68, 1.72, 2.58, 0.0, 1.53, 0, 0, 5.08, -1.35, 1.26, 1.25, 2.58, 4.5, 0.0, 4.23, 1.82, -5.58, -0.52, -6.28, -5.24, -1.52, -10.32, 0.36, 0.88, 0.0, 4.21, 0, 0, 0, 0, 7.2, -1.84, 10.32, 7.77, 2.14, -1.44, 7.98, -4.8] ------------------------------------------------------------------------------------------------------------------- Episode 701, # Stocks: 3, Total Rewards: 347.32, Portfolio Value: 1214.81, Available Budget: 652.49, Buys: 68, Sells: 65 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0, 5.2, 9.72, 5.68, -2.01, 10.5, 11.82, -8.79, 5.16, 4.56, 32.34, 25.76, -11.08, 11.68, -5.46, -4.2, 0.56, 11.36, -1.3, 4.26, 0.0, -2.32, 0.0, 0, 0.98, 2.69, 0, -0.51, 2.1, 1.2, 0, 0, -2.23, 3.81, -6.84, -8.36, 7.88, 12.72, 2.8, 17.16, -5.95, 16.8, 15.04, -10.15, 8.8, 7.92, -11.82, -2.52, 15.6, 6.36, 10.16, 5.08, -2.16, -3.74, 1.8, -5.26, -2.46, 0.0, 5.46, -0.35, 0.02, 0, 2.32, -1.96, -3.26, 1.24, -1.56, -0.03, 13.95, 3.81, -0.09, -2.1, 0.0, -3.32, 31.12, -0.28, -3.46, 3.57, 0.39, 0.0, -0.5, 0, 1.24, 7.08, 0.44, -0.96, 0.0, 0.28, 1.15, 2.44, 0, 0.05, 0, 0, -1.37, -0.74, -2.78, 5.5, 0.39, 8.49, -1.44, 1.28, 4.12, -2.18, 0.09, 1.05, 0, 0, -1.41, 2.79, 3.57, 0.68, 17.52, -6.04, -4.52, 1.92, -2.26, -8.28, -2.65, 11.83, 3.85, 0.6, 9.9, -0.78, 2.74, 0.0, -1.19, 2.43, 3.48, 2.64, -1.28, 2.61, 1.86, -1.69, -6.05, 0.0, -18.36, -9.42, 1.9, 0.0, -0.44, 0.0, 0, -2.01, -1.76, 0.0, 0, 0, 2.78, 0, 0, 4.46, 1.58, 11.79, 3.53, 0.66, 1.59, 0, 0, -5.35, 0.62, 0, 3.06, -2.09, 4.59, -1.46, 2.96, 0, 0, 0, 1.72, 1.29, 4.12, -1.53, 0.26, 0, 5.08, -1.35, 3.78, 2.5, 5.16, 6.0, -0.6, 2.82, 1.82, -5.58, -0.39, -4.71, -1.31, -1.14, -2.58, 0.48, 0.88, 0.0, 4.21, 0, 4.14, 1.44, 6.4, 14.4, -0.92, 10.32, 5.18, 2.14, -0.96, 7.98, -4.8] ------------------------------------------------------------------------------------------------------------------- Episode 801, # Stocks: 5, Total Rewards: 277.70, Portfolio Value: 1167.81, Available Budget: 230.61, Buys: 68, Sells: 63 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0, 5.2, 0, 0, 0, 0, 0, -2.93, 0, 0, 0, 0, 2.77, 0, -2.73, -1.05, 0, 0, 0, 0, -1.62, 1.16, 4.07, 0, 0, 2.69, 0, -0.51, 2.1, 1.2, 15.36, 5.6, -6.69, 6.35, -11.4, -12.54, 9.85, 14.84, 2.0, 11.44, -4.25, 12.0, 7.52, -7.25, 5.5, 6.6, -11.82, -3.78, 15.6, 11.13, 12.7, 5.08, -2.16, -7.48, 3.6, -13.15, -3.69, -3.5, 27.3, -1.05, 0.06, 3.72, 5.8, -3.92, -6.52, 1.24, -3.12, -0.04, 18.6, 5.08, -0.18, -3.15, -4.36, -8.3, 31.12, -0.28, -3.46, 8.92, 0.58, -1.18, -1.5, 0.0, 2.48, 7.08, 0.55, -1.92, -2.64, 0.56, 4.6, 4.88, 1.87, -0.1, 5.68, 1.72, 0.0, -0.74, -2.78, 2.75, 0, 5.66, -0.96, 2.56, 10.3, -4.36, 0.27, -1.05, 12.16, -0.96, -4.23, 8.37, 5.95, 2.04, 30.66, -9.06, -6.78, 2.4, -5.65, -6.21, -2.65, 6.76, 3.08, 0.45, 16.5, -1.3, 5.48, -3.94, -4.76, 3.24, 2.61, 2.64, -1.28, 5.22, 1.24, -2.53, -3.02, 0.0, -9.18, -6.28, 3.8, -1.61, 0.0, -0.18, 5.01, -6.03, -0.88, -7.71, 1.47, 4.05, 2.78, 3.89, -4.74, 2.23, 4.74, 3.93, 0, 0.44, 1.59, 0.72, 0.0, -5.35, 1.86, 4.72, -3.06, 0.0, 1.53, -0.73, 2.96, 3.3, 0.0, -3.12, 3.44, 6.45, -8.24, -4.59, 0.52, 0.52, 7.62, -4.05, 2.52, 2.5, 5.16, 6.0, -2.4, 5.64, 3.64, -7.44, -0.26, -4.71, -5.24, -0.76, -2.58, 0.24, 0.44, -2.34, 4.21, 0, 0, 0.96, 9.6, 14.4, -3.68, 7.74, 5.18, 4.28, -1.92, 15.96, -8.0] ------------------------------------------------------------------------------------------------------------------- Episode 901, # Stocks: 3, Total Rewards: 461.77, Portfolio Value: 1320.13, Available Budget: 757.81, Buys: 70, Sells: 67 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, -0.73, 0.18, 10.4, 16.2, 8.52, -2.01, 8.4, 5.91, -11.72, 5.16, 6.84, 26.95, 18.4, -8.31, 8.76, -10.92, -2.1, 0.42, 14.2, -2.6, 6.39, -1.62, -2.32, 0.0, 0, 0, 2.69, 2.42, -1.02, 0.0, 0.6, 15.36, 11.2, -2.23, 5.08, -9.12, -10.45, 13.79, 10.6, 2.8, 14.3, -4.25, 12.0, 13.16, -10.15, 7.7, 7.92, -13.79, -4.41, 18.72, 9.54, 12.7, 5.08, -1.08, -1.87, 1.8, -7.89, -1.23, 0.0, 5.46, -0.7, 0.02, 0, 0, 0, -1.63, 0.93, 0.0, -0.01, 13.95, 5.08, -0.36, -5.25, -3.27, -8.3, 54.46, -0.42, -6.92, 7.14, 0.78, -5.9, -3.0, 0.0, 4.34, 16.52, 0.77, -3.84, -7.92, 1.68, 4.6, 14.64, 13.09, -0.3, 14.2, 3.44, -5.48, -1.48, -6.95, 13.75, 2.73, 16.98, -2.88, 3.84, 10.3, -6.54, 0.63, -6.3, 15.2, -0.96, -5.64, 8.37, 3.57, 1.7, 26.28, -9.06, -4.52, 3.36, -6.78, -8.28, -1.59, 5.07, 3.85, 0.6, 13.2, -1.3, 5.48, -3.94, -4.76, 3.24, 5.22, 3.52, -2.56, 13.05, 1.86, -3.38, -12.1, -2.82, -9.18, -6.28, 1.9, 0.0, -0.44, -0.36, 6.68, -2.01, -1.76, 0.0, 0.49, 1.35, 0, 7.78, 4.74, 0, 0, 7.86, 0, 0, 0, 0, -6.79, -10.7, 1.24, 1.18, -3.06, 2.09, 0, 0.73, 0, 0, -3.58, 1.56, 0, 2.58, -8.24, -3.06, 0.52, 2.08, 12.7, -6.75, 8.82, 7.5, 12.9, 10.5, -3.6, 7.05, 4.55, -9.3, -0.78, -9.42, -7.86, -2.28, -10.32, 0.6, 2.2, -7.02, -21.05, 5.32, 12.42, 3.36, 19.2, 28.8, -6.44, 18.06, 15.54, 5.35, -2.88, 19.95, -4.8] ------------------------------------------------------------------------------------------------------------------- Episode 1001, # Stocks: 6, Total Rewards: 270.73, Portfolio Value: 1152.46, Available Budget: 27.82, Buys: 68, Sells: 62 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, -0.73, 0, 0, 0, 0, 0, 4.2, 5.91, -8.79, 6.45, 3.42, 10.78, 7.36, -5.54, 2.92, -5.46, -2.1, 0.56, 14.2, -2.6, 12.78, -9.72, -6.96, -24.42, 2.58, 2.45, -8.07, 7.26, -3.06, -12.6, 4.2, 30.72, 19.6, -13.38, 7.62, -9.12, -6.27, 11.82, 14.84, 2.4, 17.16, -5.1, 14.4, 13.16, -5.8, 4.4, 5.28, -3.94, -0.63, 12.48, 4.77, 12.7, 3.81, -0.54, -5.61, 1.8, 0.0, 1.23, -0.7, 0, 0, 0, 0, 2.32, -0.98, 1.63, 0, 1.56, -0.01, 0, 0, -0.09, -1.05, -1.09, -1.66, 7.78, 0.07, 1.73, 3.57, 0.58, 0.0, -1.0, 0.0, 2.48, 4.72, 0.44, -3.84, -10.56, 1.68, 4.6, 14.64, 7.48, -0.2, 8.52, 1.72, -2.74, -1.11, -4.17, 5.5, 0.39, 8.49, -1.44, 1.92, 4.12, 0.0, 0.09, 1.05, 6.08, 0.32, -1.41, 8.37, 1.19, 1.02, 17.52, -6.04, -4.52, 1.44, -1.13, -6.21, -1.59, 8.45, 4.62, 0.75, 13.2, -1.04, 5.48, -9.85, -5.95, 3.24, 3.48, 2.64, -3.84, 13.05, 3.72, -2.53, -6.05, -1.41, -18.36, -9.42, 4.75, -6.44, -1.1, -0.54, 5.01, -8.04, -4.4, -7.71, 1.47, 2.7, 2.78, 7.78, 0.0, 0, 3.16, 11.79, 14.12, 0.66, 3.18, 0.48, -20.37, -21.4, 1.86, 3.54, -9.18, -2.09, 3.06, 0.0, 2.96, 3.3, 0.0, -3.12, 1.72, 5.16, -12.36, -6.12, 1.56, 3.64, 17.78, -5.4, 8.82, 7.5, 18.06, 7.5, -1.8, 8.46, 6.37, -11.16, -0.52, -7.85, -3.93, -1.9, -15.48, 0.84, 2.64, -14.04, -16.84, 5.32, 8.28, 1.92, 19.2, 25.2, -3.68, 12.9, 12.95, 5.35, -2.4, 19.95, -9.6] ------------------------------------------------------------------------------------------------------------------- Episode 1101, # Stocks: 6, Total Rewards: 275.61, Portfolio Value: 1173.74, Available Budget: 49.10, Buys: 60, Sells: 54 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0, 0, 0, 6.48, 0, -0.67, 2.1, 0, -2.93, 1.29, 3.42, 5.39, 11.04, -5.54, 2.92, 2.73, 0, 0.28, 2.84, -0.65, 0, 0, 0, -4.07, 0, 0.98, 2.69, 0, 0.51, 2.1, 0, 0, 5.6, -2.23, 1.27, -4.56, 0.0, 5.91, 4.24, 0.8, 5.72, -1.7, 9.6, 3.76, -2.9, 2.2, 5.28, -1.97, -1.89, 15.6, 6.36, 10.16, 3.81, -0.54, 0.0, 0, 2.63, 1.23, -0.7, 0, 0, 0, 0, 0, -0.98, 1.63, 0, -1.56, -0.02, 9.3, 5.08, -0.09, 0.0, -1.09, -1.66, 23.34, -0.14, 0.0, 5.35, 0.78, -1.18, -1.5, 0.0, 3.1, 7.08, 0.22, 0.0, -2.64, 0, 0, 4.88, 1.87, -0.1, 2.84, 0.86, -1.37, -0.37, 1.39, 0, 0, 0, -0.48, 0, 0, 1.09, 0, -1.05, 9.12, -0.96, -4.23, 5.58, 4.76, 1.7, 13.14, -4.53, -3.39, 1.44, -4.52, -10.35, -2.65, 6.76, 2.31, 0.45, 9.9, -0.78, 6.85, -9.85, -5.95, 4.05, 4.35, 4.4, -6.4, 10.44, 2.48, -3.38, -6.05, -1.41, 0.0, 3.14, 0, 1.61, 0, 0.18, 0, 2.01, -0.88, 2.57, 0.98, 1.35, 0, 0, -4.74, 6.69, 3.16, 7.86, 7.06, 0.22, 4.77, 0.24, -6.79, -5.35, 0.62, 0, -3.06, -2.09, 1.53, -1.46, 2.96, 0, 3.58, 1.56, 0, 2.58, -8.24, 0.0, 0.78, 0.52, 2.54, 1.35, 2.52, 3.75, 2.58, 0, -0.6, 4.23, 0.91, -3.72, -0.26, -3.14, 0.0, -0.38, -5.16, 0.12, 0.44, -4.68, -8.42, 5.32, 10.35, 1.44, 16.0, 14.4, -4.6, 12.9, 12.95, 7.49, -2.88, 19.95, -9.6] ------------------------------------------------------------------------------------------------------------------- Episode 1201, # Stocks: 5, Total Rewards: 407.34, Portfolio Value: 1277.39, Available Budget: 340.19, Buys: 59, Sells: 54 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0, 0, 0, 0, 0, -0.67, 6.3, 7.88, -11.72, 7.74, 7.98, 26.95, 14.72, -11.08, 17.52, -13.65, -3.15, 0.56, 11.36, -2.6, 6.39, -1.62, 0.0, 4.07, 0, 0.98, 2.69, 0, 0.51, 2.1, 0, 0, 0, 2.23, 0, 0, 0, 3.94, 6.36, 0.8, 2.86, -1.7, 9.6, 9.4, -7.25, 4.4, 3.96, -7.88, -2.52, 18.72, 7.95, 12.7, 6.35, -1.62, -3.74, 2.7, -7.89, -1.23, -1.4, 21.84, -1.4, 0.08, 3.72, 5.8, -4.9, -4.89, 1.24, -7.8, -0.05, 18.6, 7.62, -0.54, -6.3, -6.54, -6.64, 38.9, -0.42, -6.92, 12.49, 1.36, -4.72, -3.0, 0.0, 3.1, 16.52, 0.66, -3.84, -7.92, 1.68, 8.05, 12.2, 13.09, -0.3, 19.88, 6.02, -8.22, -2.22, -8.34, 16.5, 2.73, 16.98, -2.88, 3.2, 8.24, -4.36, 0.27, -3.15, 15.2, -1.6, -7.05, 19.53, 7.14, 2.04, 30.66, -9.06, -6.78, 2.4, -5.65, -12.42, -3.18, 8.45, 3.85, 1.05, 19.8, -1.56, 9.59, -11.82, -7.14, 4.86, 5.22, 4.4, -7.68, 18.27, 3.1, -5.07, -12.1, -4.23, -45.9, -18.84, 5.7, -9.66, -0.88, -0.9, 6.68, -4.02, -2.64, -10.28, 1.47, 6.75, 8.34, 27.23, -18.96, 11.15, 7.9, 27.51, 21.18, 1.1, 6.36, 0.72, -27.16, -21.4, 3.72, 5.9, -15.3, -12.54, 10.71, -2.92, 14.8, 4.4, -14.32, -3.12, 2.58, 6.45, -16.48, -7.65, 1.3, 3.64, 15.24, -8.1, 8.82, 6.25, 18.06, 10.5, -3.6, 9.87, 4.55, -11.16, -0.52, -4.71, -5.24, -0.76, -10.32, 0.72, 3.08, -14.04, -16.84, 9.31, 14.49, 3.36, 16.0, 25.2, -5.52, 18.06, 18.13, 5.35, -1.44, 23.94, -9.6] ------------------------------------------------------------------------------------------------------------------- Episode 1301, # Stocks: 1, Total Rewards: 206.73, Portfolio Value: 1086.56, Available Budget: 899.12, Buys: 54, Sells: 53 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0.12, 0, 0, 0, 0, 4.2, 1.97, -5.86, 5.16, 5.7, 16.17, 7.36, 0.0, 2.92, -5.46, -2.1, 0.28, 5.68, -1.95, 4.26, -3.24, 0.0, -4.07, 0.43, 0.49, 2.69, 0, 0.51, 2.1, 0, 0, 5.6, 2.23, 0, 0, 0, 3.94, 2.12, 1.2, 2.86, 0.85, 0, 3.76, -1.45, 0, 0, 1.97, -0.63, 3.12, 0, 5.08, 1.27, -0.54, 1.87, 1.8, 2.63, 1.23, -0.7, 16.38, -1.05, 0.06, 3.72, 5.8, -3.92, -3.26, 0.93, -6.24, -0.02, 13.95, 2.54, -0.27, -3.15, -3.27, -4.98, 15.56, -0.21, -1.73, 3.57, 0.78, -1.18, -1.5, 0.0, 1.86, 11.8, 0.66, -2.88, -5.28, 0.56, 1.15, 7.32, 1.87, -0.05, 8.52, 0.86, -1.37, -0.37, -1.39, 2.75, 0.39, 0, 0, 1.28, 0, 1.09, 0.18, 1.05, 6.08, 0.32, 0, 0, 0, 0, 8.76, -3.02, -2.26, 0.48, -2.26, -6.21, -0.53, 6.76, 2.31, 0.3, 6.6, -0.52, 1.37, -1.97, 1.19, 1.62, 0.87, 0.88, -2.56, 2.61, 0, -0.84, 3.02, 0, 9.18, 3.14, 0, 0, -0.22, 0.18, 0, -2.01, -0.88, 2.57, 0, 0, 2.78, 3.89, -4.74, 6.69, 1.58, 3.93, 10.59, 0.22, 1.59, 0.24, 6.79, -5.35, 0, 0, 0, -2.09, 1.53, 0.73, 0, 0, 0, 1.56, 0, 0, -4.12, -1.53, 0, 0, 5.08, 1.35, 0, 2.5, 2.58, 1.5, -1.2, 2.82, 1.82, -5.58, -0.39, -1.57, -2.62, -0.76, -5.16, 0.24, 0.88, 0.0, -8.42, 1.33, 2.07, 0.48, 0, 0, -0.92, 2.58, 0, 0, 0.48, 7.98, 1.6] ------------------------------------------------------------------------------------------------------------------- Episode 1401, # Stocks: 0, Total Rewards: 291.25, Portfolio Value: 1155.31, Available Budget: 1155.31, Buys: 55, Sells: 55 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0.12, 0, 6.48, 0, -0.67, 2.1, 5.91, -8.79, 6.45, 4.56, 32.34, 25.76, -16.62, 17.52, -16.38, -6.3, 0.7, 19.88, -3.9, 14.91, -6.48, -6.96, -16.28, 3.01, 3.43, -16.14, 7.26, -3.06, -8.4, 3.0, 25.6, 14.0, -11.15, 6.35, -11.4, -12.54, 13.79, 12.72, 2.0, 11.44, -4.25, 16.8, 13.16, -5.8, 7.7, 6.6, -11.82, -2.52, 21.84, 9.54, 15.24, 7.62, -2.16, -11.22, 6.3, -10.52, -3.69, -1.4, 16.38, -1.05, 0.06, 3.72, 3.48, -0.98, 0.0, 0, -1.56, 0.01, 0, 2.54, 0.09, 0, 0, 0, 15.56, -0.07, -1.73, 5.35, 0.19, -1.18, -1.0, 0.0, 0.62, 2.36, 0.11, 0.96, -2.64, 0.84, 1.15, 2.44, 0, -0.05, 8.52, 1.72, -2.74, -0.74, -2.78, 2.75, 1.17, 5.66, -1.44, 1.28, 4.12, -2.18, 0.18, -3.15, 15.2, -1.28, -2.82, 13.95, 7.14, 2.38, 30.66, -9.06, -6.78, 2.88, -4.52, -12.42, -2.12, 8.45, 5.39, 0.75, 16.5, -0.78, 4.11, -5.91, -3.57, 1.62, 3.48, 2.64, -3.84, 13.05, 2.48, -3.38, -12.1, -5.64, -36.72, -15.7, 4.75, -8.05, -1.32, -0.72, 8.35, -10.05, -5.28, -15.42, 2.45, 6.75, 9.73, 23.34, -28.44, 15.61, 11.06, 27.51, 24.71, 1.1, 7.95, 0.96, -13.58, -16.05, 3.1, 4.72, -12.24, -4.18, 4.59, -2.92, 11.84, 4.4, -14.32, -6.24, 5.16, 5.16, -8.24, -1.53, 0.26, 0, 5.08, 1.35, 2.52, 1.25, 2.58, 4.5, -1.2, 5.64, 1.82, -5.58, -0.13, 0.0, 1.31, 0.38, -2.58, 0, 0, 2.34, -4.21, 0, 0, 0, 6.4, 3.6, -1.84, 5.16, 10.36, 2.14, -0.96, 7.98, 0.0] ------------------------------------------------------------------------------------------------------------------- Episode 1501, # Stocks: 1, Total Rewards: 298.80, Portfolio Value: 1145.35, Available Budget: 957.91, Buys: 59, Sells: 58 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, -0.73, 0, 0, 0, 0, 0, 4.2, 5.91, 0.0, 1.29, 1.14, 0, 7.36, -2.77, 0, 2.73, 1.05, 0, 5.68, -1.3, 8.52, -1.62, -3.48, -16.28, 1.72, 1.47, -8.07, 2.42, -1.53, -2.1, 2.4, 15.36, 5.6, -4.46, 1.27, -4.56, -4.18, 7.88, 4.24, 1.6, 14.3, -3.4, 9.6, 7.52, -2.9, 3.3, 6.6, -7.88, -3.15, 21.84, 11.13, 17.78, 8.89, -3.24, -11.22, 4.5, -7.89, -4.92, -3.5, 38.22, -2.1, 0.12, 6.2, 4.64, -3.92, -6.52, 1.24, -3.12, -0.03, 23.25, 5.08, -0.45, -5.25, -5.45, -8.3, 54.46, -0.42, -10.38, 12.49, 1.36, -7.08, -3.0, 0.0, 3.72, 16.52, 0.55, -5.76, -15.84, 1.4, 5.75, 9.76, 5.61, -0.05, 2.84, 0, 0, 0.37, 0, 5.5, 0.39, 8.49, -1.44, 1.28, 8.24, -4.36, 0.27, -4.2, 18.24, -1.6, -4.23, 8.37, 2.38, 0.34, 13.14, 0.0, 1.13, 0, -1.13, 2.07, 0.53, 3.38, 2.31, 0.15, 9.9, -0.78, 2.74, -3.94, -3.57, 4.05, 2.61, 2.64, -5.12, 7.83, 1.86, -3.38, -12.1, -7.05, -27.54, -15.7, 3.8, -3.22, -0.66, -0.54, 3.34, 0.0, -0.88, -5.14, 0.98, 2.7, 1.39, 0, 4.74, 0, 0, 0, 7.06, 0, 0, 0, -6.79, 5.35, 0, 0, -3.06, 2.09, 3.06, 0.73, 5.92, 1.1, 3.58, 0, 0, 0, 4.12, 1.53, 0, 0, 5.08, -2.7, 1.26, 3.75, 2.58, 1.5, -0.6, 1.41, 0.91, 1.86, 0.13, -1.57, -1.31, -0.76, 0.0, 0.12, 0.44, 2.34, 0, 2.66, 6.21, 0.48, 3.2, 3.6, -1.84, 10.32, 5.18, 4.28, -1.44, 7.98, -3.2] ------------------------------------------------------------------------------------------------------------------- Episode 1601, # Stocks: 4, Total Rewards: 256.35, Portfolio Value: 1137.08, Available Budget: 387.32, Buys: 50, Sells: 46 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0, 0, 0, 0, 0.67, 4.2, 5.91, -8.79, 3.87, 3.42, 26.95, 11.04, -2.77, 5.84, -5.46, -3.15, 0.28, 5.68, -1.95, 6.39, -4.86, -1.16, 0.0, 0, 0, 0, 0, 0, 2.1, 0, 0, 0, 2.23, 2.54, -2.28, -2.09, 1.97, 2.12, 0.4, 0, 0, 0, 0, 1.45, 0, 0, 1.97, 0.63, 0, 0, 0, 2.54, -1.08, 0.0, 0.9, 2.63, 0, 0, 0, 0.35, 0, 0, 0, 0.98, 0, 0, 1.56, 0.01, 9.3, 1.27, -0.18, -2.1, 0.0, -3.32, 31.12, -0.21, -1.73, 3.57, 0.78, -1.18, -1.5, 0.0, 3.1, 14.16, 0.55, -5.76, -10.56, 1.12, 3.45, 12.2, 7.48, -0.25, 19.88, 4.3, -8.22, -2.22, -5.56, 19.25, 1.95, 11.32, -1.92, 3.84, 10.3, -5.45, 0.45, -3.15, 12.16, -1.28, -2.82, 8.37, 3.57, 1.02, 8.76, -3.02, -2.26, 0.96, -2.26, -6.21, -1.59, 8.45, 3.08, 0.45, 16.5, -0.52, 6.85, -7.88, -5.95, 3.24, 3.48, 5.28, -6.4, 10.44, 1.86, -3.38, -6.05, -4.23, -36.72, -12.56, 3.8, -6.44, -0.88, -0.36, 3.34, -6.03, -2.64, -7.71, 1.47, 4.05, 6.95, 15.56, -18.96, 8.92, 4.74, 11.79, 7.06, 0.88, 7.95, 1.44, -33.95, -26.75, 3.1, 7.08, -9.18, -4.18, 4.59, -0.73, 2.96, 1.1, 3.58, 1.56, 0, 2.58, 4.12, 1.53, 0.52, 1.56, 10.16, -4.05, 6.3, 5.0, 15.48, 7.5, -3.0, 5.64, 3.64, -3.72, -0.39, -4.71, -3.93, -1.52, -5.16, 0.6, 2.64, -14.04, -25.26, 7.98, 14.49, 2.4, 12.8, 21.6, -5.52, 18.06, 18.13, 5.35, -2.88, 19.95, -4.8] ------------------------------------------------------------------------------------------------------------------- Episode 1701, # Stocks: 0, Total Rewards: 266.37, Portfolio Value: 1125.50, Available Budget: 1125.50, Buys: 63, Sells: 63 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0, 0, 6.48, 4.26, -1.34, 8.4, 9.85, -11.72, 5.16, 4.56, 32.34, 25.76, -16.62, 14.6, -16.38, -6.3, 0.7, 14.2, -3.25, 14.91, -9.72, -4.64, -20.35, 2.15, 3.43, -10.76, 4.84, -2.04, -8.4, 2.4, 15.36, 14.0, -11.15, 6.35, -13.68, -12.54, 13.79, 10.6, 2.0, 11.44, -4.25, 9.6, 5.64, -1.45, 2.2, 5.28, -5.91, -1.89, 9.36, 3.18, 10.16, 6.35, -1.08, -1.87, 1.8, -5.26, 0.0, -1.4, 5.46, 0.35, 0, 2.48, 1.16, 0.98, 1.63, 0, -1.56, 0.01, 9.3, 1.27, 0.09, -1.05, -1.09, -3.32, 31.12, -0.07, -5.19, 5.35, 0.39, 0.0, -1.0, 0.0, 3.1, 14.16, 0.55, -2.88, -10.56, 0.84, 5.75, 9.76, 7.48, -0.1, 5.68, 3.44, -5.48, -0.74, -4.17, 13.75, 1.56, 11.32, -0.96, 1.28, 8.24, -4.36, 0.36, -2.1, 9.12, -0.32, 0.0, 2.79, 3.57, 0.68, 4.38, 1.51, 0, 0.96, 1.13, 2.07, 0.53, 0, 0, 0.3, 0, 0.26, 0, -1.97, 1.19, 1.62, 0, 1.76, -2.56, 2.61, 0.62, -0.84, -6.05, -2.82, -27.54, -9.42, 4.75, -6.44, -0.44, -0.18, 6.68, -6.03, -0.88, 0.0, 0, 0, 2.78, 0, 0, 0, 3.16, 11.79, 14.12, 0.66, 3.18, 0.48, 0.0, 5.35, 0, 2.36, 3.06, 0, 0, 0.73, 0, 2.2, -3.58, 1.56, 0, 0, 0, 1.53, 0, 1.04, 7.62, 0.0, 0, 0, 0, 0, -0.6, 0, 1.82, -3.72, -0.39, -6.28, -5.24, -0.76, -10.32, 0.72, 1.76, -9.36, -8.42, 6.65, 6.21, 0.96, 6.4, 14.4, -0.92, 10.32, 5.18, 2.14, -0.96, 3.99, 1.6] ------------------------------------------------------------------------------------------------------------------- Episode 1801, # Stocks: 4, Total Rewards: 363.27, Portfolio Value: 1246.65, Available Budget: 496.90, Buys: 56, Sells: 52 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0, 0, 6.48, 4.26, -2.01, 4.2, 3.94, 0.0, 3.87, 1.14, 0, 7.36, 2.77, 0, 0, 1.05, 0.28, 8.52, 0.0, 6.39, -3.24, 0.0, -8.14, 1.72, 0.98, 0.0, 1.21, -1.02, -4.2, 2.4, 25.6, 16.8, -13.38, 6.35, -11.4, -12.54, 11.82, 14.84, 2.4, 17.16, -3.4, 16.8, 11.28, -8.7, 6.6, 9.24, -11.82, -3.78, 15.6, 11.13, 12.7, 8.89, -3.24, -11.22, 6.3, -15.78, -4.92, -4.2, 32.76, -2.1, 0.14, 8.68, 6.96, -5.88, -9.78, 1.86, -6.24, -0.06, 32.55, 6.35, -0.45, -5.25, -5.45, -8.3, 38.9, -0.21, -3.46, 8.92, 1.17, -3.54, -2.0, 0.0, 2.48, 7.08, 0.55, -3.84, -10.56, 0.84, 5.75, 9.76, 7.48, -0.2, 17.04, 4.3, -4.11, -1.85, -4.17, 8.25, 1.17, 5.66, -1.44, 3.2, 12.36, -5.45, 0.45, -6.3, 18.24, -1.28, -8.46, 16.74, 5.95, 1.36, 17.52, -3.02, -3.39, 2.4, -2.26, -8.28, -2.12, 10.14, 3.85, 1.05, 19.8, -1.04, 6.85, -9.85, -3.57, 2.43, 1.74, 3.52, -5.12, 15.66, 3.1, -2.53, -12.1, -5.64, -36.72, -12.56, 3.8, -8.05, -0.66, -0.36, 8.35, -8.04, -1.76, -2.57, 0.98, 2.7, 2.78, 7.78, 0.0, 2.23, 1.58, 3.93, 0, 0, 0, 0, 0, 5.35, 0, 0, 0, -2.09, 1.53, -1.46, 5.92, 2.2, -7.16, -3.12, 1.72, 5.16, -4.12, -3.06, 0.26, 0, 0, 1.35, 2.52, 1.25, 7.74, 1.5, -1.2, 5.64, 4.55, -7.44, -0.65, -9.42, -7.86, -2.28, -15.48, 0.6, 1.76, -9.36, -21.05, 9.31, 14.49, 3.36, 16.0, 25.2, -5.52, 18.06, 18.13, 5.35, -2.88, 27.93, -6.4] ------------------------------------------------------------------------------------------------------------------- Episode 1901, # Stocks: 2, Total Rewards: 273.55, Portfolio Value: 1133.71, Available Budget: 758.83, Buys: 55, Sells: 53 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0, 0, 0, 0, 0, -0.67, 0, 3.94, -2.93, 0, 0, 0, 7.36, -5.54, 2.92, -5.46, 0.0, 0.14, 2.84, -0.65, 0, 0, 0, 0, 0, 0, 2.69, 2.42, 0.51, 2.1, 1.2, 15.36, 2.8, -2.23, 3.81, -4.56, -6.27, 3.94, 8.48, 1.2, 5.72, 0.0, 2.4, 0, 1.45, 0, 0, -1.97, 0.63, 0, 3.18, 7.62, 1.27, -1.08, 0.0, 0, 0, 0, 0, 10.92, -0.35, 0.02, 0, 2.32, -0.98, 1.63, 0.62, -1.56, 0.01, 9.3, 0, 0, 1.05, -1.09, -3.32, 31.12, -0.21, -6.92, 5.35, 0.39, 0.0, 0.5, 0, 1.24, 0, 0, 0, 2.64, 0, 0, 0, 0, 0, 5.68, 2.58, -2.74, 0.0, -1.39, 8.25, 0.39, 8.49, -1.44, 1.28, 4.12, -2.18, 0.36, -4.2, 18.24, -1.6, -7.05, 16.74, 4.76, 1.02, 13.14, -4.53, -1.13, 1.92, -3.39, -2.07, -1.06, 3.38, 1.54, 0.6, 16.5, -1.04, 5.48, -7.88, -2.38, 1.62, 0.87, 0.88, -2.56, 5.22, 1.24, -1.69, 0.0, -1.41, -9.18, -6.28, 1.9, -4.83, -0.66, -0.54, 3.34, -6.03, -2.64, -10.28, 2.94, 5.4, 8.34, 15.56, -23.7, 15.61, 7.9, 19.65, 24.71, 1.54, 7.95, 0.96, -13.58, -5.35, 1.24, 4.72, -9.18, -6.27, 4.59, -0.73, 2.96, 1.1, 3.58, 0, 1.72, 3.87, 0.0, -1.53, 0, 0, 0, 1.35, 2.52, 0, 0, 0, 0, 0, 0, 1.86, 0, 1.57, -1.31, 0.38, 2.58, 0, 0, 0, 4.21, 0, 4.14, 0, 6.4, 3.6, -0.92, 2.58, 7.77, 4.28, -0.48, 15.96, -1.6] ------------------------------------------------------------------------------------------------------------------- Episode 2001, # Stocks: 4, Total Rewards: 330.83, Portfolio Value: 1198.53, Available Budget: 448.77, Buys: 61, Sells: 57 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0, 0, 6.48, 0, 0.67, 0, 3.94, -5.86, 1.29, 0, 0, 0, 0, 0, 0, -1.05, 0.14, 2.84, -0.65, 0, -1.62, -2.32, -12.21, 1.29, 0.98, 0.0, 0, -0.51, 2.1, 1.2, 15.36, 5.6, 0.0, 3.81, -4.56, 0.0, 0, 0, 0, 0, -0.85, 7.2, 7.52, -5.8, 4.4, 7.92, -11.82, -3.78, 21.84, 11.13, 12.7, 5.08, -1.08, -5.61, 4.5, -10.52, -4.92, -2.8, 21.84, -1.4, 0.08, 7.44, 5.8, -4.9, -9.78, 1.86, -9.36, -0.04, 32.55, 8.89, -0.36, -5.25, -6.54, -6.64, 38.9, -0.21, -8.65, 7.14, 0.58, -3.54, -0.5, 0.0, 0.62, 7.08, 0.22, -2.88, -2.64, 0.56, 1.15, 7.32, 3.74, -0.1, 2.84, 0.86, -2.74, -1.11, -5.56, 16.5, 1.95, 19.81, -2.88, 3.84, 12.36, -6.54, 0.45, -3.15, 12.16, -0.64, -5.64, 16.74, 8.33, 1.7, 30.66, -9.06, -4.52, 1.92, -2.26, -6.21, -1.59, 5.07, 2.31, 0.75, 13.2, -1.04, 5.48, -7.88, -4.76, 4.86, 6.09, 4.4, -7.68, 18.27, 3.72, -5.07, -18.15, -8.46, -55.08, -12.56, 6.65, -9.66, -1.32, -1.08, 10.02, -12.06, -3.52, -7.71, 1.96, 8.1, 6.95, 27.23, -18.96, 8.92, 4.74, 11.79, 10.59, 0.44, 3.18, 0.24, 6.79, 5.35, 1.24, 0, -3.06, -2.09, 4.59, 0.0, 2.96, 3.3, 0.0, -1.56, 0.86, 3.87, 0.0, -3.06, 1.04, 2.6, 15.24, -6.75, 5.04, 7.5, 18.06, 10.5, -3.6, 9.87, 4.55, -5.58, -0.52, -3.14, -3.93, -0.38, 0.0, 0, 0.88, -2.34, -8.42, 2.66, 2.07, 1.44, 12.8, 18.0, -3.68, 10.32, 7.77, 5.35, -2.4, 15.96, -3.2] ------------------------------------------------------------------------------------------------------------------- Episode 2101, # Stocks: 3, Total Rewards: 240.85, Portfolio Value: 1131.37, Available Budget: 569.05, Buys: 58, Sells: 55 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0, 0.12, 7.8, 3.24, 1.42, -1.34, 4.2, 3.94, -5.86, 5.16, 3.42, 16.17, 7.36, -5.54, 5.84, -5.46, -3.15, 0.28, 2.84, -0.65, 6.39, -4.86, -1.16, -8.14, 0.43, 0, 2.69, 0, -0.51, -2.1, 1.8, 10.24, 11.2, -8.92, 3.81, -6.84, -8.36, 5.91, 4.24, 1.6, 5.72, 0.0, 7.2, 7.52, -1.45, 2.2, 2.64, -5.91, -0.63, 12.48, 7.95, 15.24, 5.08, -2.7, -5.61, 2.7, -7.89, -3.69, -0.7, 21.84, -0.35, 0.02, 0, 2.32, -1.96, -4.89, 0.62, -4.68, -0.01, 18.6, 3.81, -0.27, -3.15, -4.36, -6.64, 23.34, -0.21, -5.19, 5.35, 0.39, -2.36, -1.5, 0.0, 1.86, 11.8, 0.66, -2.88, -10.56, 1.12, 3.45, 7.32, 3.74, -0.15, 8.52, 4.3, -5.48, -1.85, -4.17, 11.0, 2.34, 19.81, -2.88, 4.48, 10.3, -6.54, 0.54, -4.2, 12.16, -1.28, -7.05, 19.53, 8.33, 2.04, 30.66, -6.04, -3.39, 1.92, -4.52, -4.14, -2.12, 10.14, 5.39, 0.9, 19.8, -1.04, 6.85, -11.82, -7.14, 4.86, 5.22, 5.28, -7.68, 15.66, 4.34, -5.07, -18.15, -5.64, -55.08, -18.84, 5.7, -6.44, -1.32, -1.08, 10.02, -12.06, -5.28, -15.42, 2.94, 9.45, 9.73, 27.23, -28.44, 11.15, 6.32, 15.72, 14.12, 0.88, 6.36, 0.72, -27.16, -10.7, 3.1, 7.08, -18.36, -8.36, 10.71, -4.38, 20.72, 6.6, -21.48, -6.24, 4.3, 6.45, -24.72, -9.18, 1.82, 3.64, 12.7, -8.1, 6.3, 8.75, 18.06, 10.5, -3.6, 8.46, 5.46, -11.16, -0.52, -9.42, -5.24, -1.14, -5.16, 0.24, 0.88, -4.68, -8.42, 5.32, 4.14, 0.96, 6.4, 14.4, -0.92, 10.32, 12.95, 6.42, -1.44, 15.96, -6.4] ------------------------------------------------------------------------------------------------------------------- Episode 2201, # Stocks: 6, Total Rewards: 270.26, Portfolio Value: 1153.51, Available Budget: 28.87, Buys: 55, Sells: 49 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0, 0, 0, 0, 2.84, -0.67, 6.3, 1.97, 2.93, 0, 0, 10.78, 3.68, -2.77, 8.76, -8.19, -3.15, 0.42, 5.68, 0.0, 0, -1.62, 1.16, -4.07, 0, 0, 2.69, 0, -0.51, -4.2, 1.2, 20.48, 5.6, 0.0, 1.27, -2.28, -2.09, 5.91, 8.48, 0.8, 11.44, -3.4, 14.4, 9.4, -4.35, 3.3, 3.96, -1.97, -1.26, 3.12, 1.59, 7.62, 5.08, -2.16, -7.48, 3.6, -13.15, -7.38, -4.2, 27.3, -1.05, 0.08, 4.96, 3.48, -0.98, -3.26, 0.31, 1.56, 0.01, 0, 0, 0, -1.05, -1.09, -1.66, 23.34, -0.14, -5.19, 8.92, 0.78, -5.9, -2.5, 0.0, 4.34, 16.52, 0.77, -5.76, -15.84, 1.96, 6.9, 17.08, 9.35, -0.25, 14.2, 4.3, -6.85, -1.11, -6.95, 19.25, 2.73, 16.98, -2.88, 4.48, 10.3, -6.54, 0.63, -6.3, 15.2, -1.6, -8.46, 16.74, 8.33, 1.7, 30.66, -6.04, -3.39, 1.92, -4.52, -4.14, -2.12, 10.14, 5.39, 1.05, 19.8, -1.56, 8.22, -11.82, -4.76, 5.67, 4.35, 4.4, -6.4, 18.27, 3.1, -5.07, -18.15, -8.46, -55.08, -12.56, 6.65, -6.44, -1.32, -1.08, 10.02, -12.06, -5.28, -10.28, 3.43, 9.45, 8.34, 27.23, -28.44, 11.15, 6.32, 15.72, 10.59, 1.1, 6.36, 1.44, -40.74, -32.1, 3.1, 8.26, -18.36, -8.36, 10.71, -2.92, 20.72, 6.6, -21.48, -9.36, 5.16, 6.45, -24.72, -9.18, 1.3, 3.64, 12.7, -4.05, 7.56, 8.75, 15.48, 10.5, -3.6, 9.87, 4.55, -5.58, -0.65, -7.85, -6.55, -2.28, -15.48, 0.6, 1.76, -11.7, -25.26, 9.31, 12.42, 2.4, 22.4, 25.2, -5.52, 18.06, 18.13, 6.42, -2.88, 27.93, -9.6] ------------------------------------------------------------------------------------------------------------------- Episode 2301, # Stocks: 1, Total Rewards: 219.57, Portfolio Value: 1099.51, Available Budget: 912.07, Buys: 61, Sells: 60 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0, 0.12, 0, 0, 2.84, -1.34, 8.4, 5.91, -8.79, 3.87, 5.7, 32.34, 14.72, -11.08, 17.52, -8.19, -4.2, 0.56, 11.36, -1.3, 10.65, -8.1, -5.8, -20.35, 3.01, 2.45, -16.14, 8.47, -2.04, -10.5, 2.4, 15.36, 8.4, -6.69, 3.81, -9.12, -8.36, 5.91, 6.36, 0.8, 11.44, -0.85, 9.6, 3.76, -2.9, 2.2, 1.32, -1.97, 0.63, 6.24, 4.77, 10.16, 6.35, -2.16, -9.35, 3.6, -13.15, -3.69, -3.5, 21.84, -1.4, 0.06, 3.72, 5.8, -4.9, -4.89, 0.93, -1.56, -0.02, 9.3, 1.27, -0.18, -3.15, -1.09, -4.98, 38.9, -0.35, -10.38, 12.49, 1.36, -7.08, -2.0, 0.0, 4.34, 16.52, 0.55, -2.88, -5.28, 0.56, 2.3, 2.44, 0, 0.05, 0, 0, -1.37, 0.37, 1.39, 0, 0, 5.66, -0.96, 0.64, 2.06, -2.18, 0.36, -1.05, 12.16, -1.28, -7.05, 11.16, 3.57, 1.02, 21.9, -7.55, -5.65, 2.88, -5.65, -6.21, -1.06, 8.45, 3.08, 0.45, 6.6, 0.0, 4.11, 0.0, 1.19, 0, 0, 0, 1.28, 0, 0, 0.84, 0, 0, -9.18, 3.14, 0, 1.61, -0.22, -0.18, 0, 0, 0, 2.57, 0, 0, 0, 0, -4.74, 2.23, 4.74, 7.86, 7.06, 0.44, 3.18, 0.24, -13.58, 0.0, 0, 0, 0, 2.09, 0, 0.73, 0, 2.2, -3.58, -3.12, 1.72, 1.29, -4.12, -3.06, 0.26, 0.52, 7.62, -4.05, 2.52, 2.5, 5.16, 6.0, -2.4, 8.46, 4.55, -9.3, -0.39, -7.85, -3.93, -0.76, -7.74, 0.6, 1.32, -2.34, 0.0, 0, 0, 0, 6.4, 3.6, -1.84, 10.32, 5.18, 2.14, 0.0, 3.99, -3.2] ------------------------------------------------------------------------------------------------------------------- Episode 2401, # Stocks: 3, Total Rewards: 255.44, Portfolio Value: 1157.79, Available Budget: 595.47, Buys: 52, Sells: 49 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0.12, 7.8, 6.48, 2.84, -1.34, 8.4, 3.94, -5.86, 2.58, 4.56, 16.17, 11.04, -8.31, 14.6, -10.92, -2.1, 0.7, 11.36, -2.6, 6.39, -6.48, -5.8, -24.42, 3.01, 2.94, -10.76, 6.05, -1.53, -4.2, 1.8, 15.36, 8.4, -6.69, 6.35, -11.4, -6.27, 5.91, 4.24, 0.8, 2.86, -0.85, 0, 3.76, -2.9, 4.4, 3.96, -5.91, -0.63, 12.48, 4.77, 5.08, 5.08, -1.62, -5.61, 2.7, -10.52, -6.15, -4.2, 32.76, -2.1, 0.1, 8.68, 6.96, -5.88, -9.78, 2.17, -6.24, -0.06, 27.9, 8.89, -0.54, -6.3, -6.54, -9.96, 46.68, -0.42, -10.38, 12.49, 0.97, -7.08, -3.0, 0.0, 4.34, 14.16, 0.77, -3.84, -7.92, 1.68, 8.05, 12.2, 13.09, -0.2, 19.88, 6.02, -8.22, -2.22, -8.34, 16.5, 2.73, 19.81, -2.88, 4.48, 10.3, -3.27, 0.27, -1.05, 6.08, -0.64, -2.82, 5.58, 4.76, 0.68, 8.76, 0.0, -1.13, 0.48, 1.13, 2.07, 0.53, 0, 0, 0.3, 9.9, 0.0, 4.11, -5.91, -4.76, 4.86, 3.48, 3.52, -2.56, 7.83, 1.86, -3.38, -12.1, -2.82, -27.54, -3.14, 1.9, -3.22, 0.0, -0.36, 3.34, 0.0, -0.88, 2.57, 0.98, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.35, 0, 2.36, -3.06, 2.09, 0, 0, 5.92, 3.3, 0.0, -1.56, 0, 0, -4.12, -3.06, 1.04, 1.04, 2.54, -1.35, 3.78, 2.5, 2.58, 0, 0.6, 2.82, 0.91, -3.72, 0.0, -1.57, 1.31, -0.38, -5.16, 0.24, 0.44, -2.34, -4.21, 0, 0, 0.96, 0, 0, 0.92, 0, 5.18, 3.21, 0.0, 11.97, -4.8] ------------------------------------------------------------------------------------------------------------------- Episode 2501, # Stocks: 4, Total Rewards: 276.06, Portfolio Value: 1135.84, Available Budget: 386.08, Buys: 62, Sells: 58 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0, 0.12, 7.8, 3.24, 4.26, 0.0, 0, 0, -2.93, 1.29, 1.14, 16.17, 14.72, -11.08, 17.52, -13.65, -5.25, 0.56, 17.04, -1.95, 6.39, -6.48, -5.8, -20.35, 1.72, 1.47, -2.69, 1.21, -0.51, 2.1, 0, 0, 5.6, 2.23, 2.54, 2.28, 0, 3.94, 6.36, 1.6, 14.3, -1.7, 4.8, 1.88, 1.45, 0, 0, 0, 0.63, 0, 0, 0, 0, 0, 1.87, 1.8, -5.26, -3.69, -2.1, 27.3, -1.4, 0.08, 3.72, 2.32, -1.96, 0.0, 0, 0, 0.01, 9.3, 0, 0.09, 0, -1.09, -1.66, 23.34, -0.21, -5.19, 5.35, 0.97, -2.36, -2.0, 0.0, 3.1, 9.44, 0.33, -0.96, 0.0, 0.28, 0, 0, 3.74, 0.05, 5.68, 0.86, -1.37, 0.37, -1.39, 0, 0, 5.66, 0.48, 0, 0, -1.09, 0.09, 1.05, 6.08, 0.32, -1.41, 8.37, 4.76, 0.68, 4.38, 1.51, 1.13, 0, 1.13, 0, -0.53, 1.69, 0, 0, 0, 0.26, 2.74, 1.97, 1.19, 0, 0, 1.76, 1.28, 5.22, 0.62, -0.84, -3.02, 1.41, 0, 0, 0, 1.61, -0.22, -0.36, 6.68, -8.04, -3.52, -10.28, 1.47, 6.75, 4.17, 7.78, -9.48, 8.92, 7.9, 15.72, 14.12, 0.66, 7.95, 0.96, -13.58, -5.35, 0.62, 0, 0, 0, 0, 0.73, 5.92, 1.1, -7.16, -4.68, 1.72, 2.58, -12.36, -6.12, 1.56, 3.64, 15.24, -8.1, 6.3, 6.25, 10.32, 4.5, -1.8, 4.23, 2.73, -1.86, -0.26, 0.0, -1.31, -0.38, -2.58, 0.12, 0.44, -2.34, 4.21, 0, 0, 0, 0, 7.2, 0.92, 5.16, 7.77, 1.07, -0.96, 15.96, -6.4] ------------------------------------------------------------------------------------------------------------------- Episode 2601, # Stocks: 5, Total Rewards: 210.32, Portfolio Value: 1101.22, Available Budget: 164.02, Buys: 65, Sells: 60 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0, 0.12, 2.6, 3.24, 4.26, 0.0, 6.3, 7.88, -2.93, 1.29, 1.14, 5.39, 3.68, -5.54, 11.68, -10.92, -2.1, 0.28, 5.68, -1.3, 2.13, -1.62, -1.16, -8.14, 0.43, 1.47, 0.0, 3.63, 0.0, -4.2, 0.6, 5.12, 8.4, -6.69, 3.81, -2.28, -6.27, 9.85, 8.48, 1.2, 8.58, -2.55, 12.0, 11.28, -7.25, 4.4, 5.28, -3.94, -0.63, 12.48, 3.18, 2.54, 3.81, 0.0, -3.74, 3.6, -2.63, -2.46, -2.1, 10.92, 0.0, 0, 0, 2.32, -0.98, -3.26, 1.24, -4.68, -0.03, 9.3, 1.27, 0.09, -1.05, 1.09, 0, 0, 0.07, 1.73, 0, 0.39, -2.36, -1.0, 0.0, 0.62, 0, 0, 0.96, 0, 0, 0, 4.88, 0, -0.05, 0, 0, 1.37, 0, 1.39, 0, 0, 5.66, 0.48, 1.28, 0, 1.09, 0, 0, 6.08, -0.32, -1.41, 0, 0, 0.68, 13.14, -3.02, 0.0, 0, 0, -2.07, -0.53, 5.07, 3.08, 0.45, 16.5, -0.52, 6.85, -3.94, -1.19, 1.62, 3.48, 2.64, -3.84, 7.83, 3.1, -1.69, -9.07, -1.41, 0.0, -6.28, 1.9, -4.83, -0.66, -0.18, 1.67, 2.01, 0, -2.57, 1.47, 1.35, 1.39, 11.67, -9.48, 2.23, 0, 0, 0, 0, 0, 0, -6.79, 5.35, 0, 0, -3.06, -2.09, 4.59, -2.19, 14.8, 3.3, -3.58, 0.0, 0.86, 3.87, -8.24, -4.59, 0.78, 1.04, 5.08, 0.0, 1.26, 3.75, 2.58, 4.5, -1.8, 7.05, 2.73, -7.44, -0.26, -6.28, -6.55, -1.14, -12.9, 0.6, 2.2, -11.7, -25.26, 7.98, 10.35, 1.92, 12.8, 14.4, -4.6, 12.9, 12.95, 5.35, -1.44, 23.94, -8.0] ------------------------------------------------------------------------------------------------------------------- Episode 2701, # Stocks: 5, Total Rewards: 233.26, Portfolio Value: 1116.39, Available Budget: 179.19, Buys: 59, Sells: 54 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0.73, 0.12, 0, 0, 0, 0.67, 0, 0, -2.93, 3.87, 2.28, 21.56, 7.36, 0.0, 0, 0, 1.05, 0.28, 8.52, 0.0, 2.13, -1.62, 1.16, 0, 0.86, 0, 2.69, 0, -0.51, -2.1, 1.8, 20.48, 8.4, -2.23, 1.27, -4.56, 0.0, 1.97, 2.12, 0.4, 8.58, 0.0, 0, 0, 0, 0, 0, 0, 0.63, 0, 3.18, 7.62, 2.54, -1.08, -3.74, 3.6, -2.63, 0.0, -1.4, 10.92, -1.05, 0.06, 2.48, 2.32, -1.96, -3.26, 1.24, -4.68, -0.03, 13.95, 6.35, -0.45, -6.3, -6.54, -9.96, 54.46, -0.28, -8.65, 12.49, 0.97, -3.54, -1.0, 0.0, 0.62, 0, 0, -0.96, -5.28, 0.28, 0, 0, 0, 0, 5.68, 0.86, -1.37, 0.37, 1.39, 0, 0, 5.66, 0.48, 0, 0, 0, 0.18, -2.1, 3.04, 0.32, 0, 5.58, 1.19, 1.02, 17.52, -1.51, -3.39, 1.44, -3.39, -2.07, -1.06, 3.38, 3.08, 0.75, 19.8, -0.78, 4.11, -5.91, -1.19, 3.24, 2.61, 1.76, -2.56, 2.61, 1.86, -1.69, -6.05, -4.23, -36.72, -12.56, 5.7, -8.05, -0.66, -0.9, 6.68, -4.02, -3.52, -12.85, 2.94, 8.1, 6.95, 23.34, -23.7, 13.38, 7.9, 19.65, 17.65, 1.1, 6.36, 0.72, -27.16, -26.75, 3.1, 4.72, -12.24, -4.18, 3.06, 0.0, 8.88, 4.4, -10.74, -1.56, 0.86, 3.87, 0.0, 1.53, 0, 0, 5.08, -1.35, 0, 0, 0, 0, 0, 0, 1.82, 1.86, -0.13, 1.57, 0, -0.38, 2.58, 0, 0.88, -2.34, -8.42, 2.66, 8.28, 2.4, 12.8, 10.8, -0.92, 5.16, 10.36, 5.35, -1.92, 23.94, -8.0] ------------------------------------------------------------------------------------------------------------------- Episode 2801, # Stocks: 5, Total Rewards: 330.46, Portfolio Value: 1201.49, Available Budget: 264.29, Buys: 68, Sells: 63 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, 0, 0.12, 0, 0, 0, 0.67, 4.2, 0, -2.93, 3.87, 2.28, 21.56, 18.4, -5.54, 14.6, -5.46, -3.15, 0.7, 17.04, -3.9, 14.91, -6.48, -5.8, -24.42, 2.58, 2.45, -16.14, 6.05, -1.53, -8.4, 1.8, 15.36, 14.0, -11.15, 5.08, -4.56, -8.36, 7.88, 12.72, 2.8, 17.16, -5.1, 16.8, 11.28, -5.8, 4.4, 5.28, -3.94, -1.89, 15.6, 4.77, 7.62, 2.54, -1.08, -5.61, 2.7, -10.52, -6.15, -3.5, 27.3, -1.75, 0.08, 3.72, 2.32, -1.96, 0.0, 0.31, 1.56, 0.01, 9.3, 1.27, -0.18, -2.1, -2.18, 0.0, 23.34, -0.21, -5.19, 8.92, 0.58, -4.72, -1.0, 0.0, 1.86, 7.08, 0.22, 0.0, -5.28, 0.56, 1.15, 2.44, 1.87, -0.1, 11.36, 4.3, -2.74, -0.37, 0.0, 2.75, 0, 5.66, -0.96, 2.56, 10.3, -4.36, 0.27, -4.2, 12.16, -0.64, -1.41, 11.16, 2.38, 0.68, 17.52, -4.53, -4.52, 2.88, -5.65, -6.21, -2.12, 6.76, 2.31, 0.75, 13.2, -1.3, 5.48, -7.88, -5.95, 3.24, 3.48, 2.64, -3.84, 7.83, 1.86, -2.53, -9.07, -4.23, -9.18, -9.42, 4.75, -3.22, -0.88, -0.9, 11.69, -8.04, -2.64, -10.28, 1.96, 8.1, 5.56, 23.34, -23.7, 11.15, 6.32, 23.58, 14.12, 0.66, 3.18, 0.48, -13.58, -10.7, 0.62, 1.18, 3.06, 2.09, 3.06, -1.46, 5.92, 4.4, -14.32, -7.8, 3.44, 5.16, -8.24, -1.53, 0.52, 1.04, 10.16, -5.4, 3.78, 2.5, 10.32, 4.5, -1.8, 4.23, 1.82, -5.58, -0.39, -1.57, -3.93, -0.38, 0.0, 0.36, 0.44, -4.68, -12.63, 2.66, 8.28, 2.4, 12.8, 14.4, -4.6, 12.9, 10.36, 6.42, -2.4, 19.95, -8.0] ------------------------------------------------------------------------------------------------------------------- Episode 2901, # Stocks: 3, Total Rewards: 352.83, Portfolio Value: 1227.89, Available Budget: 665.57, Buys: 67, Sells: 64 ------------------------------------------------------------------------------------------------------------------- Daily Profits: [0, -0.73, 0.06, 2.6, 0, 0, 0.67, 0, 0, 0, 2.58, 3.42, 10.78, 7.36, 0.0, 8.76, -8.19, -1.05, 0.56, 8.52, -1.95, 6.39, -1.62, 0.0, -4.07, 1.29, 0.98, 0.0, 3.63, -1.53, -8.4, 3.6, 35.84, 19.6, -8.92, 8.89, -9.12, -12.54, 11.82, 10.6, 2.8, 20.02, -3.4, 16.8, 9.4, -4.35, 4.4, 5.28, -3.94, -1.89, 6.24, 6.36, 12.7, 5.08, -2.7, -11.22, 6.3, -15.78, -7.38, -4.2, 27.3, -1.75, 0.14, 6.2, 5.8, -2.94, -8.15, 1.24, -3.12, -0.04, 13.95, 6.35, -0.36, -5.25, -3.27, -8.3, 31.12, -0.35, -8.65, 12.49, 1.36, -7.08, -2.0, 0.0, 3.1, 11.8, 0.55, -2.88, -10.56, 1.12, 4.6, 14.64, 13.09, -0.2, 14.2, 3.44, -2.74, -0.37, -4.17, 5.5, 0.39, 0, 0.48, 1.28, 6.18, 0.0, 0, -1.05, 9.12, -0.96, -4.23, 13.95, 3.57, 1.02, 21.9, -6.04, -5.65, 3.36, -6.78, -8.28, -3.18, 11.83, 3.85, 1.05, 19.8, -1.56, 6.85, -9.85, -7.14, 4.86, 6.09, 6.16, -7.68, 13.05, 2.48, -1.69, -9.07, -4.23, -36.72, -12.56, 2.85, -6.44, -0.88, -0.9, 6.68, -4.02, -0.88, -5.14, 1.96, 6.75, 5.56, 23.34, -14.22, 8.92, 6.32, 11.79, 10.59, 0.66, 7.95, 0.72, -6.79, -16.05, 1.24, 1.18, -3.06, -4.18, 1.53, -1.46, 11.84, 2.2, -7.16, -4.68, 2.58, 3.87, -4.12, -4.59, 0.52, 2.08, 12.7, -2.7, 3.78, 6.25, 10.32, 4.5, -2.4, 4.23, 1.82, 0.0, 0.13, 1.57, 1.31, -0.38, 2.58, 0.24, 1.32, 0.0, 4.21, 0, 0, 0.96, 3.2, 3.6, -1.84, 2.58, 2.59, 3.21, -1.44, 11.97, -4.8]
plt.figure(figsize=(15, 10))
plt.plot(rewards1, label='Rewards', linestyle='-',color='blue')
plt.plot(profits1, label='Portfolio Values', linestyle='-', color='red')
plt.title('Q-Learning Profit Progression')
plt.xlabel('Episode Number')
plt.ylabel('Profit')
plt.legend()
plt.grid(True)
plt.ylim(-100, 3000)
plt.show()
print("Average Reward Value Over Training: $", round(np.mean(rewards1),2))
print("Maximum Reward Value Over Training: $", round(np.max(rewards1),2))
print("Minimum Reward Value Over Training: $", round(np.min(rewards1),2))
print("------------------------------------------------------------------")
print("Average Portfolio Value Over Training: $", round(np.mean(profits1),2))
print("Maximum Portfolio Value Over Training: $", round(np.max(profits1),2))
print("Minimum Portfolio Value Over Training: $", round(np.min(profits1),2))
Average Reward Value Over Training: $ 291.75 Maximum Reward Value Over Training: $ 574.12 Minimum Reward Value Over Training: $ 80.49 ------------------------------------------------------------------ Average Portfolio Value Over Training: $ 1167.05 Maximum Portfolio Value Over Training: $ 1420.69 Minimum Portfolio Value Over Training: $ 976.52
So on avergae, we are turning our 1000 USD initial investment into 1167.05 USD but we see the algorithm acting quite stable above with a high of 1420.69 USD and low of 976.52 USD.
Now the DQN builds upon the simple Q-Learning algorithm as follows:
References:
Note also that we will be showing a few variations of the DQN where some aspects are either added or removed to see if stability/convergence can be reached through training.
Note that I will also show an execution of the NN with only a policy network in Part 3.
class DQNAgent:
def __init__(self, state_size, action_size, learning_rate=0.001,
discount_factor=0.99, epsilon=1.0, epsilon_decay=0.995,
epsilon_min=0.01, batch_size=32, memory_size=2000,
target_update_freq=100):
self.state_size = state_size
self.action_size = action_size
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.epsilon = epsilon
self.epsilon_decay = epsilon_decay
self.epsilon_min = epsilon_min
self.batch_size = batch_size
self.memory = deque(maxlen=memory_size)
# main policy network and target network
self.model = self._build_model() # policy network to predict q-values for action
self.target_model = clone_model(self.model) # target network used for providing stable target q-values
self.update_target_model() # ensure target network has syncronized as the policy network
self.target_update_freq = target_update_freq # sets frequency at which weights are updated
# create the policy network (simple single layer model to expedite training and processing time)
def _build_model(self):
model = Sequential()
model.add(Dense(24, input_dim=self.state_size, activation='relu'))
model.add(Dense(self.action_size, activation='linear')) # output layer for action size
model.compile(loss='mse', optimizer=Adam(learning_rate=self.learning_rate))
return model
# update the target network weights with the policy network weights at a specified interval
def update_target_model(self):
self.target_model.set_weights(self.model.get_weights())
# store experiences in memory
def remember(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
# choose action
def choose_action(self, state):
if np.random.rand() <= self.epsilon:
return np.random.choice(self.action_size) # EXPLORE
else:
q_values = self.model.predict(state) # EXPLOIT - predict using policy network
return np.argmax(q_values[0])
# experience replay to train the policy network using experiences from memory
def replay(self):
if len(self.memory) < self.batch_size:
return # stop replay if not enough memory
batch = random.sample(self.memory, self.batch_size) # sample from memory
for state, action, reward, next_state, done in batch:
q_target = reward # default reward
if not done: # if episode is not over, apply Bellman piecewise
future_q = np.amax(self.target_model.predict(next_state)[0]) # predict best q-value for next state using target
q_target += self.discount_factor * future_q # apply Bellman
q_values = self.model.predict(state) # get current q-values for state using policy network
q_values[0][action] = q_target # update q-value for action with above calculated target q-value
self.model.fit(state, q_values, epochs=1, verbose=0) # train policy network on the updated q-values
# decay epsilon
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
# analyze action distribution
# NOTE this has nothing to do with algo just part of post-analysis
def analyze_action_distribution(self):
actions = [exp[1] for exp in self.memory] # get all actions from memory
unique, counts = np.unique(actions, return_counts=True) # get unique actions and counts
action_distribution = dict(zip(unique, counts))
return action_distribution
# calculate reward which rewards buying low and selling high but updates above with no negative rewards, just 0
def calculate_reward(prev_action, prev_price, current_price, num_stocks_held):
if prev_action == 0: # buy
reward = max(0, current_price - prev_price) # potential profit
elif prev_action == 1: # sell
reward = max(0, prev_price - current_price) # loss avoidance
else:
reward = 0 # hold
# additional reward for overall portfolio increase
if num_stocks_held > 0:
reward += (current_price - prev_price) * num_stocks_held
return round(reward, 2)
# train DQN
def train_dqn(state_matrix, action_space, num_episodes, max_budget, update_freq=25, target_update_freq=50):
# init agent
agent = DQNAgent(
state_size=state_matrix.shape[1],
action_size=len(action_space),
target_update_freq=target_update_freq
)
# init portfolio value, reward value and total steps (for update freq)
profits = []
rewards = []
total_steps = 0
# loop through episodes
for episode in range(num_episodes):
# init episode values
total_reward = 0
num_stocks_held = 0
available_budget = max_budget
state = state_matrix[0].reshape(1, -1)
prev_portfolio_value = available_budget # portfolio value at the start
prev_action = None # track previous action for reward
# init tracked actions
buys = 0
sells = 0
# loop through state matrix
for t in range(1, len(state_matrix) - 1):
action = agent.choose_action(state) # choose action
next_state = state_matrix[t + 1].reshape(1, -1) # pick next state
current_price = state_matrix[t, 0] # set today price
prev_price = state_matrix[t-1, 0] # set yesterday price
# calculate the portfolio value today
current_portfolio_value = available_budget + (num_stocks_held * current_price)
# calculate reward based on the updated reward function (i.e. what profit did action yesterday get me)
reward = calculate_reward(prev_action, prev_price, current_price, num_stocks_held)
# track the previous action for reward calculation
prev_action = action
# update budget and inventory based on the action
if action == 0 and available_budget >= current_price: # buy if i have budget
# adjust inventory, budget, and tracking
num_stocks_held += 1
available_budget -= current_price
buys += 1
elif action == 1 and num_stocks_held > 0: # sell if i have inventory
# adjust inventory, budget, and tracking
num_stocks_held -= 1
available_budget += current_price
sells += 1
# code needs to process the next state, which would not be possible on the very last step.
done = (t == len(state_matrix) - 2) # check if at the end of episode
# store experience in agent memory
agent.remember(state, action, reward, next_state, done)
# run replay if replay running frequency is reached
# NOTE THIS HAD TO BE DONE IN ORDER TO SPEED UP PROCESSING TIME
if total_steps % update_freq == 0:
agent.replay()
# update weights of target network from policy network is update frequency is reached
if total_steps % agent.target_update_freq == 0:
agent.update_target_model()
# update state and total reward
total_reward += reward
state = next_state
total_steps += 1 # update frequency counter
# update the portfolio value
prev_portfolio_value = current_portfolio_value
# calculate final portfolio value
final_portfolio_value = available_budget + (num_stocks_held * state_matrix[-1, 0])
# record profit/portfolio values and total rewards
profits.append(final_portfolio_value)
rewards.append(total_reward)
if episode % 50 == 0:
print(
f"Episode {episode + 1}, "
f"# Stocks: {num_stocks_held}, "
f"Total Rewards: {total_reward:.2f}, "
f"Final Portfolio Value: {final_portfolio_value:.2f}, "
f"Available Budget: {available_budget:.2f}, "
f"Buys: {buys}, "
f"Sells: {sells}"
)
return agent, profits, rewards
agent5, profits5, rewards5 = train_dqn(lagged_df.values, action_space=[0, 1, 2],
num_episodes=3000, max_budget=1000, update_freq=100, target_update_freq=200)
Episode 1, # Stocks: 4, Total Rewards: 358.38, Final Portfolio Value: 1217.36, Available Budget: 467.60, Buys: 56, Sells: 52 Episode 51, # Stocks: 4, Total Rewards: 122.42, Final Portfolio Value: 1032.58, Available Budget: 282.82, Buys: 42, Sells: 38 Episode 101, # Stocks: 1, Total Rewards: 141.14, Final Portfolio Value: 1007.34, Available Budget: 819.90, Buys: 26, Sells: 25 Episode 151, # Stocks: 7, Total Rewards: 447.79, Final Portfolio Value: 1343.27, Available Budget: 31.19, Buys: 18, Sells: 11 Episode 201, # Stocks: 6, Total Rewards: 477.08, Final Portfolio Value: 1319.88, Available Budget: 195.24, Buys: 20, Sells: 14 Episode 251, # Stocks: 0, Total Rewards: 206.44, Final Portfolio Value: 1165.21, Available Budget: 1165.21, Buys: 6, Sells: 6 Episode 301, # Stocks: 0, Total Rewards: 481.71, Final Portfolio Value: 1350.38, Available Budget: 1350.38, Buys: 9, Sells: 9 Episode 351, # Stocks: 7, Total Rewards: 526.34, Final Portfolio Value: 1336.56, Available Budget: 24.48, Buys: 14, Sells: 7 Episode 401, # Stocks: 7, Total Rewards: 494.61, Final Portfolio Value: 1322.41, Available Budget: 10.33, Buys: 7, Sells: 0 Episode 451, # Stocks: 7, Total Rewards: 442.74, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 501, # Stocks: 7, Total Rewards: 526.38, Final Portfolio Value: 1345.52, Available Budget: 33.44, Buys: 35, Sells: 28 Episode 551, # Stocks: 1, Total Rewards: -1.20, Final Portfolio Value: 1001.43, Available Budget: 813.99, Buys: 1, Sells: 0 Episode 601, # Stocks: 7, Total Rewards: 499.19, Final Portfolio Value: 1322.93, Available Budget: 10.85, Buys: 8, Sells: 1 Episode 651, # Stocks: 5, Total Rewards: 37.41, Final Portfolio Value: 1021.13, Available Budget: 83.93, Buys: 6, Sells: 1 Episode 701, # Stocks: 5, Total Rewards: 114.04, Final Portfolio Value: 1045.16, Available Budget: 107.96, Buys: 5, Sells: 0 Episode 751, # Stocks: 6, Total Rewards: 280.83, Final Portfolio Value: 1196.92, Available Budget: 72.28, Buys: 8, Sells: 2 Episode 801, # Stocks: 7, Total Rewards: 496.29, Final Portfolio Value: 1321.32, Available Budget: 9.24, Buys: 8, Sells: 1 Episode 851, # Stocks: 7, Total Rewards: 530.42, Final Portfolio Value: 1337.33, Available Budget: 25.25, Buys: 12, Sells: 5 Episode 901, # Stocks: 7, Total Rewards: 433.49, Final Portfolio Value: 1329.51, Available Budget: 17.43, Buys: 9, Sells: 2 Episode 951, # Stocks: 7, Total Rewards: 519.82, Final Portfolio Value: 1334.75, Available Budget: 22.67, Buys: 8, Sells: 1 Episode 1001, # Stocks: 7, Total Rewards: 424.16, Final Portfolio Value: 1320.97, Available Budget: 8.89, Buys: 8, Sells: 1 Episode 1051, # Stocks: 7, Total Rewards: 430.08, Final Portfolio Value: 1329.81, Available Budget: 17.73, Buys: 8, Sells: 1 Episode 1101, # Stocks: 7, Total Rewards: 493.59, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1151, # Stocks: 5, Total Rewards: 149.34, Final Portfolio Value: 1021.32, Available Budget: 84.12, Buys: 6, Sells: 1 Episode 1201, # Stocks: 6, Total Rewards: 350.53, Final Portfolio Value: 1262.89, Available Budget: 138.25, Buys: 7, Sells: 1 Episode 1251, # Stocks: 7, Total Rewards: 506.54, Final Portfolio Value: 1322.78, Available Budget: 10.70, Buys: 8, Sells: 1 Episode 1301, # Stocks: 5, Total Rewards: 135.71, Final Portfolio Value: 1020.00, Available Budget: 82.80, Buys: 6, Sells: 1 Episode 1351, # Stocks: 0, Total Rewards: 527.10, Final Portfolio Value: 1320.63, Available Budget: 1320.63, Buys: 25, Sells: 25 Episode 1401, # Stocks: 0, Total Rewards: 161.64, Final Portfolio Value: 1062.91, Available Budget: 1062.91, Buys: 28, Sells: 28 Episode 1451, # Stocks: 7, Total Rewards: 547.66, Final Portfolio Value: 1355.33, Available Budget: 43.25, Buys: 15, Sells: 8 Episode 1501, # Stocks: 5, Total Rewards: 24.19, Final Portfolio Value: 1018.88, Available Budget: 81.68, Buys: 9, Sells: 4 Episode 1551, # Stocks: 5, Total Rewards: 88.94, Final Portfolio Value: 1052.08, Available Budget: 114.87, Buys: 14, Sells: 9 Episode 1601, # Stocks: 7, Total Rewards: 508.10, Final Portfolio Value: 1323.56, Available Budget: 11.48, Buys: 8, Sells: 1 Episode 1651, # Stocks: 1, Total Rewards: 153.24, Final Portfolio Value: 1003.61, Available Budget: 816.17, Buys: 14, Sells: 13 Episode 1701, # Stocks: 6, Total Rewards: 308.59, Final Portfolio Value: 1192.79, Available Budget: 68.15, Buys: 34, Sells: 28 Episode 1751, # Stocks: 0, Total Rewards: 526.71, Final Portfolio Value: 1322.17, Available Budget: 1322.17, Buys: 12, Sells: 12 Episode 1801, # Stocks: 6, Total Rewards: 302.40, Final Portfolio Value: 1193.18, Available Budget: 68.54, Buys: 8, Sells: 2 Episode 1851, # Stocks: 7, Total Rewards: 517.74, Final Portfolio Value: 1333.71, Available Budget: 21.63, Buys: 9, Sells: 2 Episode 1901, # Stocks: 6, Total Rewards: 390.51, Final Portfolio Value: 1240.03, Available Budget: 115.39, Buys: 7, Sells: 1 Episode 1951, # Stocks: 7, Total Rewards: 508.67, Final Portfolio Value: 1331.87, Available Budget: 19.79, Buys: 8, Sells: 1 Episode 2001, # Stocks: 7, Total Rewards: 632.69, Final Portfolio Value: 1403.81, Available Budget: 91.73, Buys: 24, Sells: 17 Episode 2051, # Stocks: 6, Total Rewards: 417.83, Final Portfolio Value: 1324.80, Available Budget: 200.16, Buys: 7, Sells: 1 Episode 2101, # Stocks: 0, Total Rewards: 477.79, Final Portfolio Value: 1284.30, Available Budget: 1284.30, Buys: 8, Sells: 8 Episode 2151, # Stocks: 7, Total Rewards: 606.24, Final Portfolio Value: 1393.56, Available Budget: 81.48, Buys: 28, Sells: 21 Episode 2201, # Stocks: 7, Total Rewards: 400.72, Final Portfolio Value: 1313.19, Available Budget: 1.11, Buys: 10, Sells: 3 Episode 2251, # Stocks: 7, Total Rewards: 502.38, Final Portfolio Value: 1321.97, Available Budget: 9.89, Buys: 8, Sells: 1 Episode 2301, # Stocks: 5, Total Rewards: 150.97, Final Portfolio Value: 1079.08, Available Budget: 141.88, Buys: 5, Sells: 0 Episode 2351, # Stocks: 1, Total Rewards: 155.43, Final Portfolio Value: 1010.31, Available Budget: 822.87, Buys: 4, Sells: 3 Episode 2401, # Stocks: 7, Total Rewards: 508.08, Final Portfolio Value: 1323.55, Available Budget: 11.47, Buys: 7, Sells: 0 Episode 2451, # Stocks: 5, Total Rewards: 27.63, Final Portfolio Value: 1028.31, Available Budget: 91.11, Buys: 6, Sells: 1 Episode 2501, # Stocks: 0, Total Rewards: 512.29, Final Portfolio Value: 1308.21, Available Budget: 1308.21, Buys: 8, Sells: 8 Episode 2551, # Stocks: 5, Total Rewards: 174.65, Final Portfolio Value: 1034.62, Available Budget: 97.42, Buys: 9, Sells: 4 Episode 2601, # Stocks: 0, Total Rewards: 519.87, Final Portfolio Value: 1318.01, Available Budget: 1318.01, Buys: 8, Sells: 8 Episode 2651, # Stocks: 5, Total Rewards: 61.51, Final Portfolio Value: 1034.20, Available Budget: 97.00, Buys: 5, Sells: 0 Episode 2701, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2751, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2801, # Stocks: 0, Total Rewards: 162.65, Final Portfolio Value: 1008.24, Available Budget: 1008.24, Buys: 3, Sells: 3 Episode 2851, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2901, # Stocks: 7, Total Rewards: 515.12, Final Portfolio Value: 1332.40, Available Budget: 20.32, Buys: 8, Sells: 1 Episode 2951, # Stocks: 5, Total Rewards: 26.21, Final Portfolio Value: 1028.96, Available Budget: 91.76, Buys: 5, Sells: 0
print("Average Reward Value Over Training: $", round(np.mean(rewards5),2))
print("Maximum Reward Value Over Training: $", round(np.max(rewards5),2))
print("Minimum Reward Value Over Training: $", round(np.min(rewards5),2))
print("------------------------------------------------------------------")
print("Average Portfolio Value Over Training: $", round(np.mean(profits5),2))
print("Maximum Portfolio Value Over Training: $", round(np.max(profits5),2))
print("Minimum Portfolio Value Over Training: $", round(np.min(profits5),2))
Average Reward Value Over Training: $ 337.47 Maximum Reward Value Over Training: $ 732.26 Minimum Reward Value Over Training: $ -57.09 ------------------------------------------------------------------ Average Portfolio Value Over Training: $ 1208.25 Maximum Portfolio Value Over Training: $ 1500.27 Minimum Portfolio Value Over Training: $ 918.9
rewards = rewards5
profits = profits5
fig = go.Figure()
# reward line
fig.add_trace(go.Scatter(x=list(range(len(rewards))), y=rewards,
mode='lines',
name='Rewards',
line=dict(color='blue', dash='solid'))) # solid line with blue color
# portfolio line
fig.add_trace(go.Scatter(x=list(range(len(profits))), y=profits,
mode='lines',
name='Portfolio Values',
line=dict(color='red', dash='solid'))) # solid line with red color
# plot figure
fig.update_layout(
title='Q-Learning Profit Progression',
xaxis=dict(title='Episode Number'),
yaxis=dict(title='Profit', range=[-100, 5000]),
legend=dict(orientation='h', x=0.5, xanchor='center', y=-0.2),
plot_bgcolor='white',
yaxis_showgrid=True,
xaxis_showgrid=True
)
# Show the plot
fig.show()
So we are seeing generally that the more we update replay and target networks the higher highs our algorithm can have on average. We are not seeing true convergence here though although the agent does tend to stay within a range as shown here.
class DQNAgent:
def __init__(self, state_size, action_size, learning_rate=0.001,
discount_factor=0.99, epsilon=1.0, epsilon_decay=0.995,
epsilon_min=0.01, batch_size=32, memory_size=2000):
self.state_size = state_size
self.action_size = action_size
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.epsilon = epsilon
self.epsilon_decay = epsilon_decay
self.epsilon_min = epsilon_min
self.batch_size = batch_size
self.memory = deque(maxlen=memory_size)
# main policy network and target network
self.model = self._build_model() # policy network to predict q-values for action
# create the policy network (simple single layer model to expedite training and processing time)
def _build_model(self):
model = Sequential()
model.add(Dense(24, input_dim=self.state_size, activation='relu'))
model.add(Dense(self.action_size, activation='linear')) # output layer for action size
model.compile(loss='mse', optimizer=Adam(learning_rate=self.learning_rate))
return model
# store experiences in memory
def remember(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
# choose action
def choose_action(self, state):
if np.random.rand() <= self.epsilon:
return np.random.choice(self.action_size) # EXPLORE
else:
q_values = self.model.predict(state) # EXPLOIT - predict using policy network
return np.argmax(q_values[0])
# experience replay to train the policy network using experiences from memory
def replay(self):
if len(self.memory) < self.batch_size:
return # stop replay if not enough memory
batch = random.sample(self.memory, self.batch_size) # sample from memory
for state, action, reward, next_state, done in batch:
q_target = reward # default reward
if not done: # if episode is not over, apply Bellman piecewise
future_q = np.amax(self.model.predict(next_state)[0]) # predict best q-value for next state using target
q_target += self.discount_factor * future_q # apply Bellman
q_values = self.model.predict(state) # get current q-values for state using policy network
q_values[0][action] = q_target # update q-value for action with above calculated target q-value
self.model.fit(state, q_values, epochs=1, verbose=0) # train policy network on the updated q-values
# decay epsilon
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
# analyze action distribution
# NOTE this has nothing to do with algo just part of post-analysis
def analyze_action_distribution(self):
actions = [exp[1] for exp in self.memory] # get all actions from memory
unique, counts = np.unique(actions, return_counts=True) # get unique actions and counts
action_distribution = dict(zip(unique, counts))
return action_distribution
# calculate reward which rewards buying low and selling high but updates above with no negative rewards, just 0
def calculate_reward(prev_action, prev_price, current_price, num_stocks_held):
if prev_action == 0: # buy
reward = max(0, current_price - prev_price) # potential profit
elif prev_action == 1: # sell
reward = max(0, prev_price - current_price) # loss avoidance
else:
reward = 0 # hold
# additional reward for overall portfolio increase
if num_stocks_held > 0:
reward += (current_price - prev_price) * num_stocks_held
return round(reward, 2)
# train DQN
def train_dqn(state_matrix, action_space, num_episodes, max_budget, update_freq=25):
# init agent
agent = DQNAgent(
state_size=state_matrix.shape[1],
action_size=len(action_space)
)
# init portfolio value, reward value and total steps (for update freq)
profits = []
rewards = []
total_steps = 0
# loop through episodes
for episode in range(num_episodes):
# init episode values
total_reward = 0
num_stocks_held = 0
available_budget = max_budget
state = state_matrix[0].reshape(1, -1)
prev_portfolio_value = available_budget # portfolio value at the start
prev_action = None # track previous action for reward
# init tracked actions
buys = 0
sells = 0
# loop through state matrix
for t in range(1, len(state_matrix) - 1):
action = agent.choose_action(state) # choose action
next_state = state_matrix[t + 1].reshape(1, -1) # pick next state
current_price = state_matrix[t, 0] # set today price
prev_price = state_matrix[t-1, 0] # set yesterday price
# calculate the portfolio value today
current_portfolio_value = available_budget + (num_stocks_held * current_price)
# calculate reward based on the updated reward function (i.e. what profit did action yesterday get me)
reward = calculate_reward(prev_action, prev_price, current_price, num_stocks_held)
# track the previous action for reward calculation
prev_action = action
# update budget and inventory based on the action
if action == 0 and available_budget >= current_price: # buy if i have budget
# adjust inventory, budget, and tracking
num_stocks_held += 1
available_budget -= current_price
buys += 1
elif action == 1 and num_stocks_held > 0: # sell if i have inventory
# adjust inventory, budget, and tracking
num_stocks_held -= 1
available_budget += current_price
sells += 1
done = (t == len(state_matrix) - 2) # check if at the end of episode
# store experience in agent memory
agent.remember(state, action, reward, next_state, done)
# run replay if replay running frequency is reached
# NOTE THIS HAD TO BE DONE IN ORDER TO SPEED UP PROCESSING TIME
if total_steps % update_freq == 0:
agent.replay()
# update state and total reward
total_reward += reward
state = next_state
total_steps += 1 # update frequency counter
# update the portfolio value
prev_portfolio_value = current_portfolio_value
# calculate final portfolio value
final_portfolio_value = available_budget + (num_stocks_held * state_matrix[-1, 0])
# record profit/portfolio values and total rewards
profits.append(final_portfolio_value)
rewards.append(total_reward)
if episode % 50 == 0:
print(
f"Episode {episode + 1}, "
f"# Stocks: {num_stocks_held}, "
f"Total Rewards: {total_reward:.2f}, "
f"Final Portfolio Value: {final_portfolio_value:.2f}, "
f"Available Budget: {available_budget:.2f}, "
f"Buys: {buys}, "
f"Sells: {sells}"
)
return agent, profits, rewards
agent6, profits6, rewards6 = train_dqn(lagged_df.values, action_space=[0, 1, 2],
num_episodes=3000, max_budget=1000, update_freq=200)
Episode 1, # Stocks: 0, Total Rewards: 143.92, Final Portfolio Value: 1025.98, Available Budget: 1025.98, Buys: 59, Sells: 59 Episode 51, # Stocks: 5, Total Rewards: 180.01, Final Portfolio Value: 1116.37, Available Budget: 179.17, Buys: 55, Sells: 50 Episode 101, # Stocks: 6, Total Rewards: 242.74, Final Portfolio Value: 1184.22, Available Budget: 59.58, Buys: 36, Sells: 30 Episode 151, # Stocks: 5, Total Rewards: 432.52, Final Portfolio Value: 1295.03, Available Budget: 357.83, Buys: 38, Sells: 33 Episode 201, # Stocks: 6, Total Rewards: 265.08, Final Portfolio Value: 1229.96, Available Budget: 105.32, Buys: 25, Sells: 19 Episode 251, # Stocks: 0, Total Rewards: 421.89, Final Portfolio Value: 1241.67, Available Budget: 1241.67, Buys: 28, Sells: 28 Episode 301, # Stocks: 0, Total Rewards: 78.75, Final Portfolio Value: 945.16, Available Budget: 945.16, Buys: 29, Sells: 29 Episode 351, # Stocks: 5, Total Rewards: 166.61, Final Portfolio Value: 1025.15, Available Budget: 87.95, Buys: 14, Sells: 9 Episode 401, # Stocks: 0, Total Rewards: 490.29, Final Portfolio Value: 1298.68, Available Budget: 1298.68, Buys: 13, Sells: 13 Episode 451, # Stocks: 7, Total Rewards: 492.17, Final Portfolio Value: 1322.86, Available Budget: 10.78, Buys: 15, Sells: 8 Episode 501, # Stocks: 7, Total Rewards: 512.43, Final Portfolio Value: 1333.84, Available Budget: 21.76, Buys: 10, Sells: 3 Episode 551, # Stocks: 5, Total Rewards: 16.92, Final Portfolio Value: 1022.42, Available Budget: 85.22, Buys: 7, Sells: 2 Episode 601, # Stocks: 1, Total Rewards: 285.23, Final Portfolio Value: 1109.21, Available Budget: 921.77, Buys: 41, Sells: 40 Episode 651, # Stocks: 5, Total Rewards: 66.02, Final Portfolio Value: 1034.20, Available Budget: 97.00, Buys: 5, Sells: 0 Episode 701, # Stocks: 2, Total Rewards: 138.25, Final Portfolio Value: 1005.53, Available Budget: 630.65, Buys: 7, Sells: 5 Episode 751, # Stocks: 7, Total Rewards: 518.82, Final Portfolio Value: 1335.45, Available Budget: 23.37, Buys: 8, Sells: 1 Episode 801, # Stocks: 5, Total Rewards: 149.30, Final Portfolio Value: 982.55, Available Budget: 45.35, Buys: 69, Sells: 64 Episode 851, # Stocks: 7, Total Rewards: 505.24, Final Portfolio Value: 1323.55, Available Budget: 11.47, Buys: 7, Sells: 0 Episode 901, # Stocks: 0, Total Rewards: 490.45, Final Portfolio Value: 1295.96, Available Budget: 1295.96, Buys: 8, Sells: 8 Episode 951, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1001, # Stocks: 3, Total Rewards: 472.79, Final Portfolio Value: 1295.90, Available Budget: 733.58, Buys: 8, Sells: 5 Episode 1051, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1101, # Stocks: 7, Total Rewards: 489.50, Final Portfolio Value: 1315.91, Available Budget: 3.83, Buys: 10, Sells: 3 Episode 1151, # Stocks: 7, Total Rewards: 502.91, Final Portfolio Value: 1323.55, Available Budget: 11.47, Buys: 7, Sells: 0 Episode 1201, # Stocks: 0, Total Rewards: 131.95, Final Portfolio Value: 1000.06, Available Budget: 1000.06, Buys: 1, Sells: 1 Episode 1251, # Stocks: 7, Total Rewards: 507.03, Final Portfolio Value: 1323.62, Available Budget: 11.54, Buys: 8, Sells: 1 Episode 1301, # Stocks: 5, Total Rewards: 37.57, Final Portfolio Value: 1034.64, Available Budget: 97.44, Buys: 6, Sells: 1 Episode 1351, # Stocks: 0, Total Rewards: 485.79, Final Portfolio Value: 1288.30, Available Budget: 1288.30, Buys: 8, Sells: 8 Episode 1401, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1451, # Stocks: 7, Total Rewards: 527.10, Final Portfolio Value: 1338.39, Available Budget: 26.31, Buys: 8, Sells: 1 Episode 1501, # Stocks: 7, Total Rewards: 517.98, Final Portfolio Value: 1333.83, Available Budget: 21.75, Buys: 8, Sells: 1 Episode 1551, # Stocks: 0, Total Rewards: 0.00, Final Portfolio Value: 1000.00, Available Budget: 1000.00, Buys: 0, Sells: 0 Episode 1601, # Stocks: 5, Total Rewards: 178.33, Final Portfolio Value: 1032.64, Available Budget: 95.44, Buys: 6, Sells: 1 Episode 1651, # Stocks: 7, Total Rewards: 517.95, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1701, # Stocks: 7, Total Rewards: 516.52, Final Portfolio Value: 1333.40, Available Budget: 21.32, Buys: 8, Sells: 1 Episode 1751, # Stocks: 5, Total Rewards: 78.49, Final Portfolio Value: 1081.40, Available Budget: 144.20, Buys: 5, Sells: 0 Episode 1801, # Stocks: 7, Total Rewards: 503.98, Final Portfolio Value: 1326.83, Available Budget: 14.75, Buys: 10, Sells: 3 Episode 1851, # Stocks: 7, Total Rewards: 531.46, Final Portfolio Value: 1342.83, Available Budget: 30.75, Buys: 15, Sells: 8 Episode 1901, # Stocks: 0, Total Rewards: 491.31, Final Portfolio Value: 1296.39, Available Budget: 1296.39, Buys: 7, Sells: 7 Episode 1951, # Stocks: 0, Total Rewards: 486.83, Final Portfolio Value: 1294.15, Available Budget: 1294.15, Buys: 10, Sells: 10 Episode 2001, # Stocks: 0, Total Rewards: 156.28, Final Portfolio Value: 1001.41, Available Budget: 1001.41, Buys: 1, Sells: 1 Episode 2051, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 8, Sells: 1 Episode 2101, # Stocks: 5, Total Rewards: 33.45, Final Portfolio Value: 1034.20, Available Budget: 97.00, Buys: 5, Sells: 0 Episode 2151, # Stocks: 7, Total Rewards: 517.19, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2201, # Stocks: 0, Total Rewards: 0.00, Final Portfolio Value: 1000.00, Available Budget: 1000.00, Buys: 0, Sells: 0 Episode 2251, # Stocks: 5, Total Rewards: 85.30, Final Portfolio Value: 1088.21, Available Budget: 151.01, Buys: 5, Sells: 0 Episode 2301, # Stocks: 0, Total Rewards: 118.02, Final Portfolio Value: 1001.49, Available Budget: 1001.49, Buys: 3, Sells: 3 Episode 2351, # Stocks: 1, Total Rewards: 49.65, Final Portfolio Value: 1052.23, Available Budget: 864.79, Buys: 1, Sells: 0 Episode 2401, # Stocks: 7, Total Rewards: 505.35, Final Portfolio Value: 1324.68, Available Budget: 12.60, Buys: 8, Sells: 1 Episode 2451, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2501, # Stocks: 0, Total Rewards: 151.10, Final Portfolio Value: 998.82, Available Budget: 998.82, Buys: 2, Sells: 2 Episode 2551, # Stocks: 5, Total Rewards: 182.01, Final Portfolio Value: 1034.20, Available Budget: 97.00, Buys: 5, Sells: 0 Episode 2601, # Stocks: 0, Total Rewards: 153.46, Final Portfolio Value: 1000.00, Available Budget: 1000.00, Buys: 0, Sells: 0 Episode 2651, # Stocks: 0, Total Rewards: 153.46, Final Portfolio Value: 1000.00, Available Budget: 1000.00, Buys: 0, Sells: 0 Episode 2701, # Stocks: 0, Total Rewards: 20.57, Final Portfolio Value: 1005.54, Available Budget: 1005.54, Buys: 1, Sells: 1 Episode 2751, # Stocks: 4, Total Rewards: 58.31, Final Portfolio Value: 1058.64, Available Budget: 308.88, Buys: 5, Sells: 1 Episode 2801, # Stocks: 2, Total Rewards: 61.87, Final Portfolio Value: 1067.10, Available Budget: 692.22, Buys: 2, Sells: 0 Episode 2851, # Stocks: 0, Total Rewards: 151.38, Final Portfolio Value: 1000.00, Available Budget: 1000.00, Buys: 0, Sells: 0 Episode 2901, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2951, # Stocks: 7, Total Rewards: 510.20, Final Portfolio Value: 1329.94, Available Budget: 17.86, Buys: 10, Sells: 3
print("Average Reward Value Over Training: $", round(np.mean(rewards6),2))
print("Maximum Reward Value Over Training: $", round(np.max(rewards6),2))
print("Minimum Reward Value Over Training: $", round(np.min(rewards6),2))
print("------------------------------------------------------------------")
print("Average Portfolio Value Over Training: $", round(np.mean(profits6),2))
print("Maximum Portfolio Value Over Training: $", round(np.max(profits6),2))
print("Minimum Portfolio Value Over Training: $", round(np.min(profits6),2))
Average Reward Value Over Training: $ 321.16 Maximum Reward Value Over Training: $ 726.16 Minimum Reward Value Over Training: $ -87.99 ------------------------------------------------------------------ Average Portfolio Value Over Training: $ 1180.25 Maximum Portfolio Value Over Training: $ 1551.58 Minimum Portfolio Value Over Training: $ 807.32
rewards = rewards6
profits = profits6
fig = go.Figure()
# reward line
fig.add_trace(go.Scatter(x=list(range(len(rewards))), y=rewards,
mode='lines',
name='Rewards',
line=dict(color='blue', dash='solid'))) # solid line with blue color
# portfolio line
fig.add_trace(go.Scatter(x=list(range(len(profits))), y=profits,
mode='lines',
name='Portfolio Values',
line=dict(color='red', dash='solid'))) # solid line with red color
# plot figure
fig.update_layout(
title='Q-Learning Profit Progression',
xaxis=dict(title='Episode Number'),
yaxis=dict(title='Profit', range=[-100, 5000]),
legend=dict(orientation='h', x=0.5, xanchor='center', y=-0.2),
plot_bgcolor='white',
yaxis_showgrid=True,
xaxis_showgrid=True
)
# Show the plot
fig.show()
We are now seeing here that as we are only using a policy network, it might seem slightly more erratic, I'm not sure. We can see when comparing this to the use of policy and target network that there is a bit more of a range where the rewards and portfolio value operate within and my belief is that this is because the target network is not helping with overall stability.
class DQNAgent:
def __init__(self, state_size, action_size, learning_rate=0.001,
discount_factor=0.99, epsilon=1.0, epsilon_decay=0.995,
epsilon_min=0.01, batch_size=32, memory_size=2000):
self.state_size = state_size
self.action_size = action_size
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.epsilon = epsilon
self.epsilon_decay = epsilon_decay
self.epsilon_min = epsilon_min
self.batch_size = batch_size
self.memory = deque(maxlen=memory_size)
# main policy network and target network
self.model = self._build_model() # policy network to predict q-values for action
# create the policy network (simple single layer model to expedite training and processing time)
def _build_model(self):
model = Sequential()
model.add(Dense(24, input_dim=self.state_size, activation='relu'))
#model.add(Dense(24, activation='relu')) # Added another layer for more complexity
model.add(Dense(self.action_size, activation='linear')) # output layer for action size
model.compile(loss='mse', optimizer=Adam(learning_rate=self.learning_rate))
return model
# store experiences in memory
def remember(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
# choose action
def choose_action(self, state):
if np.random.rand() <= self.epsilon:
return np.random.choice(self.action_size) # EXPLORE
else:
q_values = self.model.predict(state) # EXPLOIT - predict using policy network
return np.argmax(q_values[0])
# experience replay to train the policy network using experiences from memory
def replay(self):
if len(self.memory) < self.batch_size:
return # stop replay if not enough memory
batch = random.sample(self.memory, self.batch_size) # sample from memory
for state, action, reward, next_state, done in batch:
q_target = reward # default reward
if not done: # if episode is not over, apply Bellman piecewise
future_q = np.amax(self.model.predict(next_state)[0]) # predict best q-value for next state using target
q_target += self.discount_factor * future_q # apply Bellman
q_values = self.model.predict(state) # get current q-values for state using policy network
q_values[0][action] = q_target # update q-value for action with above calculated target q-value
self.model.fit(state, q_values, epochs=1, verbose=0) # train policy network on the updated q-values
# decay epsilon
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
# analyze action distribution
# NOTE this has nothing to do with algo just part of post-analysis
def analyze_action_distribution(self):
actions = [exp[1] for exp in self.memory] # get all actions from memory
unique, counts = np.unique(actions, return_counts=True) # get unique actions and counts
action_distribution = dict(zip(unique, counts))
return action_distribution
# calculate reward which rewards buying low and selling high but updates above with no negative rewards, just 0
def calculate_reward(prev_action, prev_price, current_price, num_stocks_held):
if prev_action == 0: # buy
reward = max(0, current_price - prev_price) # potential profit
elif prev_action == 1: # sell
reward = max(0, prev_price - current_price) # loss avoidance
else:
reward = 0 # hold
# additional reward for overall portfolio increase
if num_stocks_held > 0:
reward += (current_price - prev_price) * num_stocks_held
return round(reward, 2)
# train DQN
def train_dqn(state_matrix, action_space, num_episodes, max_budget, update_freq=25):
# init agent
agent = DQNAgent(
state_size=state_matrix.shape[1],
action_size=len(action_space)
)
# init portfolio value, reward value and total steps (for update freq)
profits = []
rewards = []
total_steps = 0
# loop through episodes
for episode in range(num_episodes):
# init episode values
total_reward = 0
num_stocks_held = 0
available_budget = max_budget
state = state_matrix[0].reshape(1, -1)
prev_portfolio_value = available_budget # portfolio value at the start
prev_action = None # track previous action for reward
# init tracked actions
buys = 0
sells = 0
# loop through state matrix
for t in range(1, len(state_matrix) - 1):
action = agent.choose_action(state) # choose action
next_state = state_matrix[t + 1].reshape(1, -1) # pick next state
current_price = state_matrix[t, 0] # set today price
prev_price = state_matrix[t-1, 0] # set yesterday price
# calculate the portfolio value today
current_portfolio_value = available_budget + (num_stocks_held * current_price)
# calculate reward based on the updated reward function (i.e. what profit did action yesterday get me)
reward = calculate_reward(prev_action, prev_price, current_price, num_stocks_held)
# track the previous action for reward calculation
prev_action = action
# update budget and inventory based on the action
if action == 0 and available_budget >= current_price: # buy if i have budget
# adjust inventory, budget, and tracking
num_stocks_held += 1
available_budget -= current_price
buys += 1
elif action == 1 and num_stocks_held > 0: # sell if i have inventory
# adjust inventory, budget, and tracking
num_stocks_held -= 1
available_budget += current_price
sells += 1
done = (t == len(state_matrix) - 2) # check if at the end of episode
# store experience in agent memory
agent.remember(state, action, reward, next_state, done)
# run replay if replay running frequency is reached
# NOTE THIS HAD TO BE DONE IN ORDER TO SPEED UP PROCESSING TIME
#if total_steps % update_freq == 0:
# agent.replay()
# update state and total reward
total_reward += reward
state = next_state
total_steps += 1 # update frequency counter
# update the portfolio value
prev_portfolio_value = current_portfolio_value
# calculate final portfolio value
final_portfolio_value = available_budget + (num_stocks_held * state_matrix[-1, 0])
# record profit/portfolio values and total rewards
profits.append(final_portfolio_value)
rewards.append(total_reward)
if episode % 50 == 0:
print(
f"Episode {episode + 1}, "
f"# Stocks: {num_stocks_held}, "
f"Total Rewards: {total_reward:.2f}, "
f"Final Portfolio Value: {final_portfolio_value:.2f}, "
f"Available Budget: {available_budget:.2f}, "
f"Buys: {buys}, "
f"Sells: {sells}"
)
return agent, profits, rewards
agent7, profits7, rewards7 = train_dqn(lagged_df.values, action_space=[0, 1, 2],
num_episodes=3000, max_budget=1000, update_freq=100)
Episode 1, # Stocks: 2, Total Rewards: 284.11, Final Portfolio Value: 1152.85, Available Budget: 777.97, Buys: 60, Sells: 58 Episode 51, # Stocks: 1, Total Rewards: 307.06, Final Portfolio Value: 1184.74, Available Budget: 997.30, Buys: 63, Sells: 62 Episode 101, # Stocks: 6, Total Rewards: 441.88, Final Portfolio Value: 1313.98, Available Budget: 189.34, Buys: 67, Sells: 61 Episode 151, # Stocks: 3, Total Rewards: 391.75, Final Portfolio Value: 1258.93, Available Budget: 696.61, Buys: 64, Sells: 61 Episode 201, # Stocks: 5, Total Rewards: 275.89, Final Portfolio Value: 1153.88, Available Budget: 216.68, Buys: 58, Sells: 53 Episode 251, # Stocks: 6, Total Rewards: 288.73, Final Portfolio Value: 1178.41, Available Budget: 53.78, Buys: 58, Sells: 52 Episode 301, # Stocks: 2, Total Rewards: 280.81, Final Portfolio Value: 1180.52, Available Budget: 805.64, Buys: 57, Sells: 55 Episode 351, # Stocks: 6, Total Rewards: 294.97, Final Portfolio Value: 1212.46, Available Budget: 87.82, Buys: 63, Sells: 57 Episode 401, # Stocks: 2, Total Rewards: 99.92, Final Portfolio Value: 1002.70, Available Budget: 627.82, Buys: 54, Sells: 52 Episode 451, # Stocks: 2, Total Rewards: 251.94, Final Portfolio Value: 1161.36, Available Budget: 786.48, Buys: 54, Sells: 52 Episode 501, # Stocks: 4, Total Rewards: 249.96, Final Portfolio Value: 1138.79, Available Budget: 389.03, Buys: 54, Sells: 50 Episode 551, # Stocks: 0, Total Rewards: 217.08, Final Portfolio Value: 1083.91, Available Budget: 1083.91, Buys: 67, Sells: 67 Episode 601, # Stocks: 6, Total Rewards: 284.21, Final Portfolio Value: 1199.12, Available Budget: 74.48, Buys: 65, Sells: 59 Episode 651, # Stocks: 5, Total Rewards: 338.83, Final Portfolio Value: 1209.19, Available Budget: 271.99, Buys: 67, Sells: 62 Episode 701, # Stocks: 5, Total Rewards: 198.92, Final Portfolio Value: 1108.85, Available Budget: 171.65, Buys: 62, Sells: 57 Episode 751, # Stocks: 3, Total Rewards: 133.21, Final Portfolio Value: 1027.17, Available Budget: 464.85, Buys: 61, Sells: 58 Episode 801, # Stocks: 2, Total Rewards: 197.29, Final Portfolio Value: 1076.01, Available Budget: 701.13, Buys: 60, Sells: 58 Episode 851, # Stocks: 6, Total Rewards: 284.04, Final Portfolio Value: 1166.35, Available Budget: 41.71, Buys: 57, Sells: 51 Episode 901, # Stocks: 0, Total Rewards: 122.72, Final Portfolio Value: 1000.45, Available Budget: 1000.45, Buys: 56, Sells: 56 Episode 951, # Stocks: 2, Total Rewards: 280.43, Final Portfolio Value: 1148.41, Available Budget: 773.53, Buys: 69, Sells: 67 Episode 1001, # Stocks: 6, Total Rewards: 262.05, Final Portfolio Value: 1195.00, Available Budget: 70.36, Buys: 59, Sells: 53 Episode 1051, # Stocks: 4, Total Rewards: 362.65, Final Portfolio Value: 1232.49, Available Budget: 482.73, Buys: 58, Sells: 54 Episode 1101, # Stocks: 4, Total Rewards: 128.77, Final Portfolio Value: 1027.23, Available Budget: 277.47, Buys: 59, Sells: 55 Episode 1151, # Stocks: 2, Total Rewards: 115.63, Final Portfolio Value: 1031.56, Available Budget: 656.68, Buys: 52, Sells: 50 Episode 1201, # Stocks: 6, Total Rewards: 276.19, Final Portfolio Value: 1156.10, Available Budget: 31.46, Buys: 68, Sells: 62 Episode 1251, # Stocks: 1, Total Rewards: 197.61, Final Portfolio Value: 1097.95, Available Budget: 910.51, Buys: 56, Sells: 55 Episode 1301, # Stocks: 3, Total Rewards: 284.50, Final Portfolio Value: 1176.36, Available Budget: 614.04, Buys: 62, Sells: 59 Episode 1351, # Stocks: 5, Total Rewards: 231.16, Final Portfolio Value: 1140.56, Available Budget: 203.36, Buys: 62, Sells: 57 Episode 1401, # Stocks: 0, Total Rewards: 149.26, Final Portfolio Value: 1036.93, Available Budget: 1036.93, Buys: 53, Sells: 53 Episode 1451, # Stocks: 5, Total Rewards: 117.96, Final Portfolio Value: 1041.63, Available Budget: 104.43, Buys: 58, Sells: 53 Episode 1501, # Stocks: 3, Total Rewards: 196.96, Final Portfolio Value: 1103.09, Available Budget: 540.77, Buys: 61, Sells: 58 Episode 1551, # Stocks: 4, Total Rewards: 273.35, Final Portfolio Value: 1181.81, Available Budget: 432.05, Buys: 58, Sells: 54 Episode 1601, # Stocks: 0, Total Rewards: 207.17, Final Portfolio Value: 1108.36, Available Budget: 1108.36, Buys: 61, Sells: 61 Episode 1651, # Stocks: 2, Total Rewards: 210.58, Final Portfolio Value: 1113.92, Available Budget: 739.04, Buys: 62, Sells: 60 Episode 1701, # Stocks: 0, Total Rewards: 199.31, Final Portfolio Value: 1093.13, Available Budget: 1093.13, Buys: 54, Sells: 54 Episode 1751, # Stocks: 2, Total Rewards: 252.98, Final Portfolio Value: 1161.74, Available Budget: 786.86, Buys: 60, Sells: 58 Episode 1801, # Stocks: 3, Total Rewards: 247.74, Final Portfolio Value: 1136.73, Available Budget: 574.41, Buys: 69, Sells: 66 Episode 1851, # Stocks: 4, Total Rewards: 226.82, Final Portfolio Value: 1119.56, Available Budget: 369.80, Buys: 66, Sells: 62 Episode 1901, # Stocks: 3, Total Rewards: 237.55, Final Portfolio Value: 1138.32, Available Budget: 576.00, Buys: 63, Sells: 60 Episode 1951, # Stocks: 4, Total Rewards: 282.41, Final Portfolio Value: 1161.97, Available Budget: 412.21, Buys: 59, Sells: 55 Episode 2001, # Stocks: 4, Total Rewards: 385.07, Final Portfolio Value: 1256.90, Available Budget: 507.14, Buys: 64, Sells: 60 Episode 2051, # Stocks: 2, Total Rewards: 162.90, Final Portfolio Value: 1047.41, Available Budget: 672.53, Buys: 53, Sells: 51 Episode 2101, # Stocks: 5, Total Rewards: 160.58, Final Portfolio Value: 1070.03, Available Budget: 132.83, Buys: 59, Sells: 54 Episode 2151, # Stocks: 1, Total Rewards: 308.42, Final Portfolio Value: 1195.30, Available Budget: 1007.86, Buys: 57, Sells: 56 Episode 2201, # Stocks: 2, Total Rewards: 234.32, Final Portfolio Value: 1122.51, Available Budget: 747.63, Buys: 57, Sells: 55 Episode 2251, # Stocks: 4, Total Rewards: 176.71, Final Portfolio Value: 1073.36, Available Budget: 323.60, Buys: 67, Sells: 63 Episode 2301, # Stocks: 3, Total Rewards: 355.12, Final Portfolio Value: 1236.05, Available Budget: 673.73, Buys: 61, Sells: 58 Episode 2351, # Stocks: 5, Total Rewards: 237.95, Final Portfolio Value: 1125.33, Available Budget: 188.13, Buys: 62, Sells: 57 Episode 2401, # Stocks: 5, Total Rewards: 320.35, Final Portfolio Value: 1200.87, Available Budget: 263.67, Buys: 68, Sells: 63 Episode 2451, # Stocks: 1, Total Rewards: 202.47, Final Portfolio Value: 1063.36, Available Budget: 875.92, Buys: 58, Sells: 57 Episode 2501, # Stocks: 4, Total Rewards: 196.24, Final Portfolio Value: 1092.95, Available Budget: 343.19, Buys: 60, Sells: 56 Episode 2551, # Stocks: 2, Total Rewards: 305.37, Final Portfolio Value: 1169.46, Available Budget: 794.58, Buys: 67, Sells: 65 Episode 2601, # Stocks: 2, Total Rewards: 158.29, Final Portfolio Value: 1060.07, Available Budget: 685.19, Buys: 58, Sells: 56 Episode 2651, # Stocks: 3, Total Rewards: 298.17, Final Portfolio Value: 1201.16, Available Budget: 638.84, Buys: 59, Sells: 56 Episode 2701, # Stocks: 2, Total Rewards: 200.52, Final Portfolio Value: 1070.42, Available Budget: 695.54, Buys: 60, Sells: 58 Episode 2751, # Stocks: 4, Total Rewards: 156.30, Final Portfolio Value: 1075.75, Available Budget: 325.99, Buys: 61, Sells: 57 Episode 2801, # Stocks: 4, Total Rewards: 145.16, Final Portfolio Value: 1031.17, Available Budget: 281.41, Buys: 63, Sells: 59 Episode 2851, # Stocks: 1, Total Rewards: 300.40, Final Portfolio Value: 1165.20, Available Budget: 977.76, Buys: 65, Sells: 64 Episode 2901, # Stocks: 3, Total Rewards: 147.74, Final Portfolio Value: 1046.81, Available Budget: 484.49, Buys: 64, Sells: 61 Episode 2951, # Stocks: 1, Total Rewards: 210.62, Final Portfolio Value: 1078.29, Available Budget: 890.85, Buys: 62, Sells: 61
print("Average Reward Value Over Training: $", round(np.mean(rewards7),2))
print("Maximum Reward Value Over Training: $", round(np.max(rewards7),2))
print("Minimum Reward Value Over Training: $", round(np.min(rewards7),2))
print("------------------------------------------------------------------")
print("Average Portfolio Value Over Training: $", round(np.mean(profits7),2))
print("Maximum Portfolio Value Over Training: $", round(np.max(profits7),2))
print("Minimum Portfolio Value Over Training: $", round(np.min(profits7),2))
Average Reward Value Over Training: $ 237.94 Maximum Reward Value Over Training: $ 484.74 Minimum Reward Value Over Training: $ -12.9 ------------------------------------------------------------------ Average Portfolio Value Over Training: $ 1126.78 Maximum Portfolio Value Over Training: $ 1359.53 Minimum Portfolio Value Over Training: $ 910.6
rewards = rewards7
profits = profits7
fig = go.Figure()
# reward line
fig.add_trace(go.Scatter(x=list(range(len(rewards))), y=rewards,
mode='lines',
name='Rewards',
line=dict(color='blue', dash='solid'))) # solid line with blue color
# portfolio line
fig.add_trace(go.Scatter(x=list(range(len(profits))), y=profits,
mode='lines',
name='Portfolio Values',
line=dict(color='red', dash='solid'))) # solid line with red color
# plot figure
fig.update_layout(
title='Q-Learning Profit Progression',
xaxis=dict(title='Episode Number'),
yaxis=dict(title='Profit', range=[-100, 5000]),
legend=dict(orientation='h', x=0.5, xanchor='center', y=-0.2),
plot_bgcolor='white',
yaxis_showgrid=True,
xaxis_showgrid=True
)
# Show the plot
fig.show()
This is frankly stagnant without the experience replay. This is because the neural network is not being trained with experiences from the replay function. It is definitely more stable, but not hitting the higher average portfolio value that we saw with the experience replay and double networks (policy and target). The agent also is not converging although it's erratic behavior seems to be less, which is because the random experience replay is not retraining the Q-network. However, this means we might as well just use the Q-table method.
Basically I read that adding regularization helps control overfitting, leading to a more robust and generalized policy network for the DQN agent. This makes the training process more stable and increases the likelihood of convergence. This is because regularization will penalize larger weights and dropout (not included here but in part 4.5) will randomly remove neurons so the remaining weights are forced to work and generalize better. Generally a value of 0.2 is a good starting point for both but a value too high could cause adverse effects (too much generalization).
References:
class DQNAgent:
def __init__(self, state_size, action_size, learning_rate=0.001,
discount_factor=0.99, epsilon=1.0, epsilon_decay=0.995,
epsilon_min=0.01, batch_size=32, memory_size=2000):
self.state_size = state_size
self.action_size = action_size
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.epsilon = epsilon
self.epsilon_decay = epsilon_decay
self.epsilon_min = epsilon_min
self.batch_size = batch_size
self.memory = deque(maxlen=memory_size)
# main policy network and target network
self.model = self._build_model() # policy network to predict q-values for action
# create the policy network (simple single layer model to expedite training and processing time)
def _build_model(self):
regularization_strength = 0.01 # lambda
model = Sequential()
model.add(Dense(
24,
input_dim=self.state_size,
activation='relu',
kernel_regularizer=l2(regularization_strength)
))
# model.add(Dropout(0.2)) # dropout with 20% probability, we will keep commented out till next variation
model.add(Dense(
self.action_size,
activation='linear',
kernel_regularizer=l2(regularization_strength) # L2 on output layer
))
model.compile(loss='mse', optimizer=Adam(learning_rate=self.learning_rate))
return model
# store experiences in memory
def remember(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
# choose action
def choose_action(self, state):
if np.random.rand() <= self.epsilon:
return np.random.choice(self.action_size) # EXPLORE
else:
q_values = self.model.predict(state) # EXPLOIT - predict using policy network
return np.argmax(q_values[0])
# experience replay to train the policy network using experiences from memory
def replay(self):
if len(self.memory) < self.batch_size:
return # stop replay if not enough memory
batch = random.sample(self.memory, self.batch_size) # sample from memory
for state, action, reward, next_state, done in batch:
q_target = reward # default reward
if not done: # if episode is not over, apply Bellman piecewise
future_q = np.amax(self.model.predict(next_state)[0]) # predict best q-value for next state using target
q_target += self.discount_factor * future_q # apply Bellman
q_values = self.model.predict(state) # get current q-values for state using policy network
q_values[0][action] = q_target # update q-value for action with above calculated target q-value
self.model.fit(state, q_values, epochs=1, verbose=0) # train policy network on the updated q-values
# decay epsilon
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
# analyze action distribution
# NOTE this has nothing to do with algo just part of post-analysis
def analyze_action_distribution(self):
actions = [exp[1] for exp in self.memory] # get all actions from memory
unique, counts = np.unique(actions, return_counts=True) # get unique actions and counts
action_distribution = dict(zip(unique, counts))
return action_distribution
# calculate reward which rewards buying low and selling high but updates above with no negative rewards, just 0
def calculate_reward(prev_action, prev_price, current_price, num_stocks_held):
if prev_action == 0: # buy
reward = max(0, current_price - prev_price) # potential profit
elif prev_action == 1: # sell
reward = max(0, prev_price - current_price) # loss avoidance
else:
reward = 0 # hold
# additional reward for overall portfolio increase
if num_stocks_held > 0:
reward += (current_price - prev_price) * num_stocks_held
return round(reward, 2)
# train DQN
def train_dqn(state_matrix, action_space, num_episodes, max_budget, update_freq=25):
# init agent
agent = DQNAgent(
state_size=state_matrix.shape[1],
action_size=len(action_space)
)
# init portfolio value, reward value and total steps (for update freq)
profits = []
rewards = []
total_steps = 0
# loop through episodes
for episode in range(num_episodes):
# init episode values
total_reward = 0
num_stocks_held = 0
available_budget = max_budget
state = state_matrix[0].reshape(1, -1)
prev_portfolio_value = available_budget # portfolio value at the start
prev_action = None # track previous action for reward
# init tracked actions
buys = 0
sells = 0
# loop through state matrix
for t in range(1, len(state_matrix) - 1):
action = agent.choose_action(state) # choose action
next_state = state_matrix[t + 1].reshape(1, -1) # pick next state
current_price = state_matrix[t, 0] # set today price
prev_price = state_matrix[t-1, 0] # set yesterday price
# calculate the portfolio value today
current_portfolio_value = available_budget + (num_stocks_held * current_price)
# calculate reward based on the updated reward function (i.e. what profit did action yesterday get me)
reward = calculate_reward(prev_action, prev_price, current_price, num_stocks_held)
# track the previous action for reward calculation
prev_action = action
# update budget and inventory based on the action
if action == 0 and available_budget >= current_price: # buy if i have budget
# adjust inventory, budget, and tracking
num_stocks_held += 1
available_budget -= current_price
buys += 1
elif action == 1 and num_stocks_held > 0: # sell if i have inventory
# adjust inventory, budget, and tracking
num_stocks_held -= 1
available_budget += current_price
sells += 1
done = (t == len(state_matrix) - 2) # check if at the end of episode
# store experience in agent memory
agent.remember(state, action, reward, next_state, done)
# run replay if replay running frequency is reached
# NOTE THIS HAD TO BE DONE IN ORDER TO SPEED UP PROCESSING TIME
if total_steps % update_freq == 0:
agent.replay()
# update state and total reward
total_reward += reward
state = next_state
total_steps += 1 # update frequency counter
# update the portfolio value
prev_portfolio_value = current_portfolio_value
# calculate final portfolio value
final_portfolio_value = available_budget + (num_stocks_held * state_matrix[-1, 0])
# record profit/portfolio values and total rewards
profits.append(final_portfolio_value)
rewards.append(total_reward)
if episode % 50 == 0:
print(
f"Episode {episode + 1}, "
f"# Stocks: {num_stocks_held}, "
f"Total Rewards: {total_reward:.2f}, "
f"Final Portfolio Value: {final_portfolio_value:.2f}, "
f"Available Budget: {available_budget:.2f}, "
f"Buys: {buys}, "
f"Sells: {sells}"
)
return agent, profits, rewards
agent8, profits8, rewards8 = train_dqn(lagged_df.values, action_space=[0, 1, 2],
num_episodes=5000, max_budget=1000, update_freq=100)
Episode 1, # Stocks: 6, Total Rewards: 189.39, Final Portfolio Value: 1131.99, Available Budget: 7.35, Buys: 54, Sells: 48 Episode 51, # Stocks: 4, Total Rewards: 165.59, Final Portfolio Value: 1080.52, Available Budget: 330.76, Buys: 44, Sells: 40 Episode 101, # Stocks: 5, Total Rewards: 142.22, Final Portfolio Value: 1047.16, Available Budget: 109.96, Buys: 29, Sells: 24 Episode 151, # Stocks: 7, Total Rewards: 416.94, Final Portfolio Value: 1320.05, Available Budget: 7.97, Buys: 17, Sells: 10 Episode 201, # Stocks: 5, Total Rewards: 173.09, Final Portfolio Value: 1038.56, Available Budget: 101.36, Buys: 11, Sells: 6 Episode 251, # Stocks: 7, Total Rewards: 551.86, Final Portfolio Value: 1365.56, Available Budget: 53.48, Buys: 16, Sells: 9 Episode 301, # Stocks: 5, Total Rewards: 89.28, Final Portfolio Value: 1088.44, Available Budget: 151.24, Buys: 8, Sells: 3 Episode 351, # Stocks: 6, Total Rewards: 228.63, Final Portfolio Value: 1148.59, Available Budget: 23.95, Buys: 20, Sells: 14 Episode 401, # Stocks: 5, Total Rewards: 510.40, Final Portfolio Value: 1328.68, Available Budget: 391.48, Buys: 7, Sells: 2 Episode 451, # Stocks: 0, Total Rewards: 417.47, Final Portfolio Value: 1191.92, Available Budget: 1191.92, Buys: 67, Sells: 67 Episode 501, # Stocks: 7, Total Rewards: 453.71, Final Portfolio Value: 1342.41, Available Budget: 30.33, Buys: 39, Sells: 32 Episode 551, # Stocks: 0, Total Rewards: 576.69, Final Portfolio Value: 1363.39, Available Budget: 1363.39, Buys: 19, Sells: 19 Episode 601, # Stocks: 7, Total Rewards: 674.86, Final Portfolio Value: 1458.78, Available Budget: 146.70, Buys: 53, Sells: 46 Episode 651, # Stocks: 0, Total Rewards: 418.96, Final Portfolio Value: 1299.32, Available Budget: 1299.32, Buys: 8, Sells: 8 Episode 701, # Stocks: 0, Total Rewards: 478.11, Final Portfolio Value: 1284.46, Available Budget: 1284.46, Buys: 8, Sells: 8 Episode 751, # Stocks: 1, Total Rewards: 425.37, Final Portfolio Value: 1308.05, Available Budget: 1120.61, Buys: 8, Sells: 7 Episode 801, # Stocks: 7, Total Rewards: 500.90, Final Portfolio Value: 1332.65, Available Budget: 20.57, Buys: 12, Sells: 5 Episode 851, # Stocks: 7, Total Rewards: 518.96, Final Portfolio Value: 1334.32, Available Budget: 22.24, Buys: 8, Sells: 1 Episode 901, # Stocks: 0, Total Rewards: 491.19, Final Portfolio Value: 1316.35, Available Budget: 1316.35, Buys: 7, Sells: 7 Episode 951, # Stocks: 7, Total Rewards: 542.44, Final Portfolio Value: 1352.72, Available Budget: 40.64, Buys: 15, Sells: 8 Episode 1001, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1051, # Stocks: 5, Total Rewards: 171.49, Final Portfolio Value: 1089.01, Available Budget: 151.81, Buys: 26, Sells: 21 Episode 1101, # Stocks: 6, Total Rewards: 373.63, Final Portfolio Value: 1263.70, Available Budget: 139.06, Buys: 7, Sells: 1 Episode 1151, # Stocks: 7, Total Rewards: 519.06, Final Portfolio Value: 1334.37, Available Budget: 22.29, Buys: 9, Sells: 2 Episode 1201, # Stocks: 1, Total Rewards: 512.19, Final Portfolio Value: 1301.34, Available Budget: 1113.90, Buys: 43, Sells: 42 Episode 1251, # Stocks: 7, Total Rewards: 513.73, Final Portfolio Value: 1333.65, Available Budget: 21.57, Buys: 8, Sells: 1 Episode 1301, # Stocks: 7, Total Rewards: 516.01, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1351, # Stocks: 0, Total Rewards: 554.05, Final Portfolio Value: 1333.76, Available Budget: 1333.76, Buys: 28, Sells: 28 Episode 1401, # Stocks: 0, Total Rewards: 161.21, Final Portfolio Value: 1005.17, Available Budget: 1005.17, Buys: 1, Sells: 1 Episode 1451, # Stocks: 7, Total Rewards: 518.80, Final Portfolio Value: 1331.97, Available Budget: 19.89, Buys: 9, Sells: 2 Episode 1501, # Stocks: 0, Total Rewards: 480.71, Final Portfolio Value: 1285.76, Available Budget: 1285.76, Buys: 8, Sells: 8 Episode 1551, # Stocks: 0, Total Rewards: 150.88, Final Portfolio Value: 998.71, Available Budget: 998.71, Buys: 2, Sells: 2 Episode 1601, # Stocks: 0, Total Rewards: 121.17, Final Portfolio Value: 1002.67, Available Budget: 1002.67, Buys: 3, Sells: 3 Episode 1651, # Stocks: 7, Total Rewards: 497.98, Final Portfolio Value: 1330.34, Available Budget: 18.26, Buys: 8, Sells: 1 Episode 1701, # Stocks: 0, Total Rewards: 480.53, Final Portfolio Value: 1285.67, Available Budget: 1285.67, Buys: 7, Sells: 7 Episode 1751, # Stocks: 0, Total Rewards: -1.19, Final Portfolio Value: 998.81, Available Budget: 998.81, Buys: 1, Sells: 1 Episode 1801, # Stocks: 0, Total Rewards: 500.56, Final Portfolio Value: 1304.13, Available Budget: 1304.13, Buys: 10, Sells: 10 Episode 1851, # Stocks: 7, Total Rewards: 522.70, Final Portfolio Value: 1337.79, Available Budget: 25.71, Buys: 9, Sells: 2 Episode 1901, # Stocks: 0, Total Rewards: 98.46, Final Portfolio Value: 1000.00, Available Budget: 1000.00, Buys: 0, Sells: 0 Episode 1951, # Stocks: 7, Total Rewards: 505.85, Final Portfolio Value: 1328.88, Available Budget: 16.80, Buys: 8, Sells: 1 Episode 2001, # Stocks: 0, Total Rewards: 516.07, Final Portfolio Value: 1315.43, Available Budget: 1315.43, Buys: 8, Sells: 8 Episode 2051, # Stocks: 0, Total Rewards: 153.61, Final Portfolio Value: 1001.42, Available Budget: 1001.42, Buys: 1, Sells: 1 Episode 2101, # Stocks: 0, Total Rewards: 490.16, Final Portfolio Value: 1296.39, Available Budget: 1296.39, Buys: 7, Sells: 7 Episode 2151, # Stocks: 1, Total Rewards: 159.21, Final Portfolio Value: 1009.34, Available Budget: 821.90, Buys: 2, Sells: 1 Episode 2201, # Stocks: 5, Total Rewards: 102.93, Final Portfolio Value: 1017.87, Available Budget: 80.67, Buys: 7, Sells: 2 Episode 2251, # Stocks: 7, Total Rewards: 413.70, Final Portfolio Value: 1323.55, Available Budget: 11.47, Buys: 7, Sells: 0 Episode 2301, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2351, # Stocks: 5, Total Rewards: 156.16, Final Portfolio Value: 1014.20, Available Budget: 77.00, Buys: 6, Sells: 1 Episode 2401, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2451, # Stocks: 7, Total Rewards: 518.18, Final Portfolio Value: 1333.93, Available Budget: 21.85, Buys: 8, Sells: 1 Episode 2501, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2551, # Stocks: 0, Total Rewards: 255.75, Final Portfolio Value: 1054.70, Available Budget: 1054.70, Buys: 53, Sells: 53 Episode 2601, # Stocks: 7, Total Rewards: 519.82, Final Portfolio Value: 1334.75, Available Budget: 22.67, Buys: 8, Sells: 1 Episode 2651, # Stocks: 7, Total Rewards: 515.88, Final Portfolio Value: 1333.26, Available Budget: 21.18, Buys: 8, Sells: 1 Episode 2701, # Stocks: 6, Total Rewards: 217.70, Final Portfolio Value: 1130.07, Available Budget: 5.43, Buys: 9, Sells: 3 Episode 2751, # Stocks: 7, Total Rewards: 518.58, Final Portfolio Value: 1334.13, Available Budget: 22.05, Buys: 8, Sells: 1 Episode 2801, # Stocks: 0, Total Rewards: 174.28, Final Portfolio Value: 1010.41, Available Budget: 1010.41, Buys: 6, Sells: 6 Episode 2851, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2901, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2951, # Stocks: 7, Total Rewards: 528.36, Final Portfolio Value: 1339.02, Available Budget: 26.94, Buys: 9, Sells: 2 Episode 3001, # Stocks: 5, Total Rewards: 159.76, Final Portfolio Value: 992.65, Available Budget: 55.45, Buys: 37, Sells: 32 Episode 3051, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 3101, # Stocks: 7, Total Rewards: 496.73, Final Portfolio Value: 1315.91, Available Budget: 3.83, Buys: 14, Sells: 7 Episode 3151, # Stocks: 7, Total Rewards: 510.13, Final Portfolio Value: 1326.04, Available Budget: 13.96, Buys: 8, Sells: 1 Episode 3201, # Stocks: 7, Total Rewards: 520.50, Final Portfolio Value: 1335.33, Available Budget: 23.25, Buys: 9, Sells: 2 Episode 3251, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 3301, # Stocks: 7, Total Rewards: 528.34, Final Portfolio Value: 1339.01, Available Budget: 26.93, Buys: 8, Sells: 1 Episode 3351, # Stocks: 2, Total Rewards: 20.09, Final Portfolio Value: 1024.52, Available Budget: 649.64, Buys: 2, Sells: 0 Episode 3401, # Stocks: 7, Total Rewards: 452.74, Final Portfolio Value: 1323.93, Available Budget: 11.85, Buys: 8, Sells: 1 Episode 3451, # Stocks: 0, Total Rewards: 255.72, Final Portfolio Value: 1054.25, Available Budget: 1054.25, Buys: 41, Sells: 41 Episode 3501, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 3551, # Stocks: 7, Total Rewards: 523.00, Final Portfolio Value: 1336.34, Available Budget: 24.26, Buys: 8, Sells: 1 Episode 3601, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 3651, # Stocks: 5, Total Rewards: -19.40, Final Portfolio Value: 981.33, Available Budget: 44.13, Buys: 6, Sells: 1 Episode 3701, # Stocks: 7, Total Rewards: 530.72, Final Portfolio Value: 1340.20, Available Budget: 28.12, Buys: 9, Sells: 2 Episode 3751, # Stocks: 7, Total Rewards: 528.34, Final Portfolio Value: 1339.01, Available Budget: 26.93, Buys: 8, Sells: 1 Episode 3801, # Stocks: 7, Total Rewards: 426.58, Final Portfolio Value: 1335.32, Available Budget: 23.24, Buys: 8, Sells: 1 Episode 3851, # Stocks: 6, Total Rewards: 488.71, Final Portfolio Value: 1306.45, Available Budget: 181.81, Buys: 12, Sells: 6 Episode 3901, # Stocks: 7, Total Rewards: 507.95, Final Portfolio Value: 1324.09, Available Budget: 12.01, Buys: 8, Sells: 1 Episode 3951, # Stocks: 7, Total Rewards: 503.30, Final Portfolio Value: 1326.49, Available Budget: 14.41, Buys: 8, Sells: 1 Episode 4001, # Stocks: 7, Total Rewards: 518.00, Final Portfolio Value: 1333.84, Available Budget: 21.76, Buys: 8, Sells: 1 Episode 4051, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 4101, # Stocks: 7, Total Rewards: 506.88, Final Portfolio Value: 1328.28, Available Budget: 16.20, Buys: 10, Sells: 3 Episode 4151, # Stocks: 7, Total Rewards: 518.34, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 4201, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 4251, # Stocks: 0, Total Rewards: 282.91, Final Portfolio Value: 1098.41, Available Budget: 1098.41, Buys: 34, Sells: 34 Episode 4301, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 4351, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 4401, # Stocks: 7, Total Rewards: 518.26, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 4451, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 4501, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 4551, # Stocks: 5, Total Rewards: 197.59, Final Portfolio Value: 1019.16, Available Budget: 81.96, Buys: 41, Sells: 36 Episode 4601, # Stocks: 7, Total Rewards: 523.23, Final Portfolio Value: 1335.58, Available Budget: 23.50, Buys: 10, Sells: 3 Episode 4651, # Stocks: 7, Total Rewards: 516.27, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 4701, # Stocks: 7, Total Rewards: 516.58, Final Portfolio Value: 1333.13, Available Budget: 21.05, Buys: 8, Sells: 1 Episode 4751, # Stocks: 7, Total Rewards: 519.17, Final Portfolio Value: 1336.37, Available Budget: 24.29, Buys: 10, Sells: 3 Episode 4801, # Stocks: 7, Total Rewards: 518.58, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 4851, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 4901, # Stocks: 7, Total Rewards: 518.84, Final Portfolio Value: 1334.37, Available Budget: 22.29, Buys: 11, Sells: 4 Episode 4951, # Stocks: 7, Total Rewards: 508.08, Final Portfolio Value: 1323.55, Available Budget: 11.47, Buys: 7, Sells: 0
print("Average Reward Value Over Training: $", round(np.mean(rewards8),2))
print("Maximum Reward Value Over Training: $", round(np.max(rewards8),2))
print("Minimum Reward Value Over Training: $", round(np.min(rewards8),2))
print("------------------------------------------------------------------")
print("Average Portfolio Value Over Training: $", round(np.mean(profits8),2))
print("Maximum Portfolio Value Over Training: $", round(np.max(profits8),2))
print("Minimum Portfolio Value Over Training: $", round(np.min(profits8),2))
Average Reward Value Over Training: $ 401.74 Maximum Reward Value Over Training: $ 713.9 Minimum Reward Value Over Training: $ -114.86 ------------------------------------------------------------------ Average Portfolio Value Over Training: $ 1245.83 Maximum Portfolio Value Over Training: $ 1487.78 Minimum Portfolio Value Over Training: $ 791.72
rewards = rewards8
profits = profits8
fig = go.Figure()
# reward line
fig.add_trace(go.Scatter(x=list(range(len(rewards))), y=rewards,
mode='lines',
name='Rewards',
line=dict(color='blue', dash='solid'))) # solid line with blue color
# portfolio line
fig.add_trace(go.Scatter(x=list(range(len(profits))), y=profits,
mode='lines',
name='Portfolio Values',
line=dict(color='red', dash='solid'))) # solid line with red color
# plot figure
fig.update_layout(
title='Q-Learning Profit Progression',
xaxis=dict(title='Episode Number'),
yaxis=dict(title='Profit', range=[-100, 5000]),
legend=dict(orientation='h', x=0.5, xanchor='center', y=-0.2),
plot_bgcolor='white',
yaxis_showgrid=True,
xaxis_showgrid=True
)
# show the plot
fig.show()
Okay this is actually not horrible and we see a generally higher average portfolio and some stability in the curves at the end. Note that there is no dropout and only a little bit of regularization added. My guess is if we include both, we will probably see substantial stabilization in the training curve so let's try this now.
Ok frankly I just wanted to test regularization on the larger two network model to see if it stabilized like we see above.
class DQNAgent:
def __init__(self, state_size, action_size, learning_rate=0.001,
discount_factor=0.99, epsilon=1.0, epsilon_decay=0.995,
epsilon_min=0.01, batch_size=32, memory_size=2000,
target_update_freq=100):
self.state_size = state_size
self.action_size = action_size
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.epsilon = epsilon
self.epsilon_decay = epsilon_decay
self.epsilon_min = epsilon_min
self.batch_size = batch_size
self.memory = deque(maxlen=memory_size)
# main policy network and target network
self.model = self._build_model() # policy network to predict q-values for action
self.target_model = clone_model(self.model) # target network used for providing stable target q-values
self.update_target_model() # ensure target network has syncronized as the policy network
self.target_update_freq = target_update_freq # sets frequency at which weights are updated
# create the policy network (simple single layer model to expedite training and processing time)
def _build_model(self):
regularization_strength = 0.01 # lambda weight decay penalizes larger weights
model = Sequential()
model.add(Dense(24, input_dim=self.state_size, activation='relu', kernel_regularizer=l2(regularization_strength)))
model.add(Dropout(0.2)) # dropout drops neurons randomly at 10% prob during training to force reduce over reliance on specific neurons
model.add(Dense(self.action_size, activation='linear', kernel_regularizer=l2(regularization_strength))) # L2 on output layer
model.compile(loss='mse', optimizer=Adam(learning_rate=self.learning_rate))
return model
# update the target network weights with the policy network weights at a specified interval
def update_target_model(self):
self.target_model.set_weights(self.model.get_weights())
# store experiences in memory
def remember(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
# choose action
def choose_action(self, state):
if np.random.rand() <= self.epsilon:
return np.random.choice(self.action_size) # EXPLORE
else:
q_values = self.model.predict(state) # EXPLOIT - predict using policy network
return np.argmax(q_values[0])
# experience replay to train the policy network using experiences from memory
def replay(self):
if len(self.memory) < self.batch_size:
return # stop replay if not enough memory
batch = random.sample(self.memory, self.batch_size) # sample from memory
for state, action, reward, next_state, done in batch:
q_target = reward # default reward
if not done: # if episode is not over, apply Bellman piecewise
future_q = np.amax(self.target_model.predict(next_state)[0]) # predict best q-value for next state using target
q_target += self.discount_factor * future_q # apply Bellman
q_values = self.model.predict(state) # get current q-values for state using policy network
q_values[0][action] = q_target # update q-value for action with above calculated target q-value
self.model.fit(state, q_values, epochs=1, verbose=0) # train policy network on the updated q-values
# decay epsilon
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
# analyze action distribution
# NOTE this has nothing to do with algo just part of post-analysis
def analyze_action_distribution(self):
actions = [exp[1] for exp in self.memory] # get all actions from memory
unique, counts = np.unique(actions, return_counts=True) # get unique actions and counts
action_distribution = dict(zip(unique, counts))
return action_distribution
# calculate reward which rewards buying low and selling high but updates above with no negative rewards, just 0
def calculate_reward(prev_action, prev_price, current_price, num_stocks_held):
if prev_action == 0: # buy
reward = max(0, current_price - prev_price) # potential profit
elif prev_action == 1: # sell
reward = max(0, prev_price - current_price) # loss avoidance
else:
reward = 0 # hold
# additional reward for overall portfolio increase
if num_stocks_held > 0:
reward += (current_price - prev_price) * num_stocks_held
return round(reward, 2)
# train DQN
def train_dqn(state_matrix, action_space, num_episodes, max_budget, update_freq=25, target_update_freq=50):
# init agent
agent = DQNAgent(
state_size=state_matrix.shape[1],
action_size=len(action_space),
target_update_freq=target_update_freq
)
# init portfolio value, reward value and total steps (for update freq)
profits = []
rewards = []
total_steps = 0
# loop through episodes
for episode in range(num_episodes):
# init episode values
total_reward = 0
num_stocks_held = 0
available_budget = max_budget
state = state_matrix[0].reshape(1, -1)
prev_portfolio_value = available_budget # portfolio value at the start
prev_action = None # track previous action for reward
# init tracked actions
buys = 0
sells = 0
# loop through state matrix
for t in range(1, len(state_matrix) - 1):
action = agent.choose_action(state) # choose action
next_state = state_matrix[t + 1].reshape(1, -1) # pick next state
current_price = state_matrix[t, 0] # set today price
prev_price = state_matrix[t-1, 0] # set yesterday price
# calculate the portfolio value today
current_portfolio_value = available_budget + (num_stocks_held * current_price)
# calculate reward based on the updated reward function (i.e. what profit did action yesterday get me)
reward = calculate_reward(prev_action, prev_price, current_price, num_stocks_held)
# track the previous action for reward calculation
prev_action = action
# update budget and inventory based on the action
if action == 0 and available_budget >= current_price: # buy if i have budget
# adjust inventory, budget, and tracking
num_stocks_held += 1
available_budget -= current_price
buys += 1
elif action == 1 and num_stocks_held > 0: # sell if i have inventory
# adjust inventory, budget, and tracking
num_stocks_held -= 1
available_budget += current_price
sells += 1
# code needs to process the next state, which would not be possible on the very last step.
done = (t == len(state_matrix) - 2) # check if at the end of episode
# store experience in agent memory
agent.remember(state, action, reward, next_state, done)
# run replay if replay running frequency is reached
# NOTE THIS HAD TO BE DONE IN ORDER TO SPEED UP PROCESSING TIME
if total_steps % update_freq == 0:
agent.replay()
# update weights of target network from policy network is update frequency is reached
if total_steps % agent.target_update_freq == 0:
agent.update_target_model()
# update state and total reward
total_reward += reward
state = next_state
total_steps += 1 # update frequency counter
# update the portfolio value
prev_portfolio_value = current_portfolio_value
# calculate final portfolio value
final_portfolio_value = available_budget + (num_stocks_held * state_matrix[-1, 0])
# record profit/portfolio values and total rewards
profits.append(final_portfolio_value)
rewards.append(total_reward)
if episode % 50 == 0:
print(
f"Episode {episode + 1}, "
f"# Stocks: {num_stocks_held}, "
f"Total Rewards: {total_reward:.2f}, "
f"Final Portfolio Value: {final_portfolio_value:.2f}, "
f"Available Budget: {available_budget:.2f}, "
f"Buys: {buys}, "
f"Sells: {sells}"
)
return agent, profits, rewards
agent9, profits9, rewards9 = train_dqn(lagged_df.values, action_space=[0, 1, 2],
num_episodes=3000, max_budget=1000, update_freq=100, target_update_freq=200)
Episode 1, # Stocks: 2, Total Rewards: 209.54, Final Portfolio Value: 1106.03, Available Budget: 731.15, Buys: 50, Sells: 48 Episode 51, # Stocks: 7, Total Rewards: 437.25, Final Portfolio Value: 1315.05, Available Budget: 2.97, Buys: 40, Sells: 33 Episode 101, # Stocks: 6, Total Rewards: 467.58, Final Portfolio Value: 1301.47, Available Budget: 176.83, Buys: 29, Sells: 23 Episode 151, # Stocks: 7, Total Rewards: 533.31, Final Portfolio Value: 1349.67, Available Budget: 37.59, Buys: 19, Sells: 12 Episode 201, # Stocks: 6, Total Rewards: 460.45, Final Portfolio Value: 1292.75, Available Budget: 168.11, Buys: 14, Sells: 8 Episode 251, # Stocks: 7, Total Rewards: 510.40, Final Portfolio Value: 1332.04, Available Budget: 19.96, Buys: 14, Sells: 7 Episode 301, # Stocks: 7, Total Rewards: 520.56, Final Portfolio Value: 1338.52, Available Budget: 26.44, Buys: 10, Sells: 3 Episode 351, # Stocks: 7, Total Rewards: 507.11, Final Portfolio Value: 1331.54, Available Budget: 19.46, Buys: 9, Sells: 2 Episode 401, # Stocks: 7, Total Rewards: 508.08, Final Portfolio Value: 1323.55, Available Budget: 11.47, Buys: 8, Sells: 1 Episode 451, # Stocks: 7, Total Rewards: 522.10, Final Portfolio Value: 1335.89, Available Budget: 23.81, Buys: 8, Sells: 1 Episode 501, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 551, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 601, # Stocks: 7, Total Rewards: 517.24, Final Portfolio Value: 1333.46, Available Budget: 21.38, Buys: 8, Sells: 1 Episode 651, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 701, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 751, # Stocks: 7, Total Rewards: 519.39, Final Portfolio Value: 1337.00, Available Budget: 24.92, Buys: 8, Sells: 1 Episode 801, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 851, # Stocks: 7, Total Rewards: 516.04, Final Portfolio Value: 1333.00, Available Budget: 20.92, Buys: 8, Sells: 1 Episode 901, # Stocks: 7, Total Rewards: 511.99, Final Portfolio Value: 1331.41, Available Budget: 19.33, Buys: 8, Sells: 1 Episode 951, # Stocks: 7, Total Rewards: 520.56, Final Portfolio Value: 1335.12, Available Budget: 23.04, Buys: 8, Sells: 1 Episode 1001, # Stocks: 7, Total Rewards: 518.05, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1051, # Stocks: 7, Total Rewards: 513.40, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1101, # Stocks: 7, Total Rewards: 513.47, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1151, # Stocks: 7, Total Rewards: 517.79, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1201, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1251, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1301, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1351, # Stocks: 7, Total Rewards: 518.96, Final Portfolio Value: 1334.32, Available Budget: 22.24, Buys: 8, Sells: 1 Episode 1401, # Stocks: 7, Total Rewards: 513.15, Final Portfolio Value: 1333.00, Available Budget: 20.92, Buys: 8, Sells: 1 Episode 1451, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1501, # Stocks: 7, Total Rewards: 511.66, Final Portfolio Value: 1330.67, Available Budget: 18.59, Buys: 8, Sells: 1 Episode 1551, # Stocks: 7, Total Rewards: 513.36, Final Portfolio Value: 1332.98, Available Budget: 20.90, Buys: 8, Sells: 1 Episode 1601, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1651, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1701, # Stocks: 7, Total Rewards: 518.06, Final Portfolio Value: 1333.87, Available Budget: 21.79, Buys: 8, Sells: 1 Episode 1751, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1801, # Stocks: 7, Total Rewards: 518.38, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1851, # Stocks: 7, Total Rewards: 515.69, Final Portfolio Value: 1334.52, Available Budget: 22.44, Buys: 9, Sells: 2 Episode 1901, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 1951, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2001, # Stocks: 7, Total Rewards: 519.73, Final Portfolio Value: 1336.01, Available Budget: 23.93, Buys: 9, Sells: 2 Episode 2051, # Stocks: 7, Total Rewards: 520.18, Final Portfolio Value: 1335.50, Available Budget: 23.42, Buys: 8, Sells: 1 Episode 2101, # Stocks: 7, Total Rewards: 509.32, Final Portfolio Value: 1329.50, Available Budget: 17.42, Buys: 9, Sells: 2 Episode 2151, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2201, # Stocks: 7, Total Rewards: 515.84, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2251, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2301, # Stocks: 7, Total Rewards: 520.32, Final Portfolio Value: 1335.00, Available Budget: 22.92, Buys: 8, Sells: 1 Episode 2351, # Stocks: 7, Total Rewards: 516.18, Final Portfolio Value: 1332.93, Available Budget: 20.85, Buys: 9, Sells: 2 Episode 2401, # Stocks: 7, Total Rewards: 524.71, Final Portfolio Value: 1337.29, Available Budget: 25.21, Buys: 8, Sells: 1 Episode 2451, # Stocks: 7, Total Rewards: 515.04, Final Portfolio Value: 1332.36, Available Budget: 20.28, Buys: 9, Sells: 2 Episode 2501, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2551, # Stocks: 7, Total Rewards: 520.82, Final Portfolio Value: 1335.25, Available Budget: 23.17, Buys: 8, Sells: 1 Episode 2601, # Stocks: 7, Total Rewards: 518.84, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2651, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2701, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2751, # Stocks: 7, Total Rewards: 522.17, Final Portfolio Value: 1336.60, Available Budget: 24.52, Buys: 9, Sells: 2 Episode 2801, # Stocks: 7, Total Rewards: 518.18, Final Portfolio Value: 1333.93, Available Budget: 21.85, Buys: 8, Sells: 1 Episode 2851, # Stocks: 7, Total Rewards: 518.86, Final Portfolio Value: 1334.27, Available Budget: 22.19, Buys: 7, Sells: 0 Episode 2901, # Stocks: 7, Total Rewards: 520.96, Final Portfolio Value: 1335.32, Available Budget: 23.24, Buys: 8, Sells: 1 Episode 2951, # Stocks: 7, Total Rewards: 518.06, Final Portfolio Value: 1333.87, Available Budget: 21.79, Buys: 8, Sells: 1
print("Average Reward Value Over Training: $", round(np.mean(rewards9),2))
print("Maximum Reward Value Over Training: $", round(np.max(rewards9),2))
print("Minimum Reward Value Over Training: $", round(np.min(rewards9),2))
print("------------------------------------------------------------------")
print("Average Portfolio Value Over Training: $", round(np.mean(profits9),2))
print("Maximum Portfolio Value Over Training: $", round(np.max(profits9),2))
print("Minimum Portfolio Value Over Training: $", round(np.min(profits9),2))
Average Reward Value Over Training: $ 508.66 Maximum Reward Value Over Training: $ 545.16 Minimum Reward Value Over Training: $ 61.88 ------------------------------------------------------------------ Average Portfolio Value Over Training: $ 1327.64 Maximum Portfolio Value Over Training: $ 1350.83 Minimum Portfolio Value Over Training: $ 932.14
rewards = rewards9
profits = profits9
fig = go.Figure()
# reward line
fig.add_trace(go.Scatter(x=list(range(len(rewards))), y=rewards,
mode='lines',
name='Rewards',
line=dict(color='blue', dash='solid'))) # solid line with blue color
# portfolio line
fig.add_trace(go.Scatter(x=list(range(len(profits))), y=profits,
mode='lines',
name='Portfolio Values',
line=dict(color='red', dash='solid'))) # solid line with red color
# plot figure
fig.update_layout(
title='Q-Learning Profit Progression',
xaxis=dict(title='Episode Number'),
yaxis=dict(title='Profit', range=[-100, 5000]),
legend=dict(orientation='h', x=0.5, xanchor='center', y=-0.2),
plot_bgcolor='white',
yaxis_showgrid=True,
xaxis_showgrid=True
)
# show the plot
fig.show()
This looks like it is converging well with a few small up and down ticks. This has to be due to the implementation of the double networks and the regularization/dropout portions added to the Q-network. We also see an average $1329.99 portfolio value which is the best average we have seen with the other variations.
Although we have exhausted a number of Reinforcement Learning topics, there are still multiple improvements and future work that could be pursued: